Sean Lahman's Baseball Database - Generic Version
Version 2.2
January, 15 1999

Web site: http://www.baseball1.com
 E-Mail : sean@baseball1.com

This database can also be purchsed on CD-ROM.  
See http://baseball1.com/stats for more details.

----------------------------------------------------------------------
1.0  Release Contents

The complete database and documentation consist of the following files.

Generic Version:
 Master.csv         Comma-delimited data table of names and bio dats
 Batting.csv        Comma-delimited data table of batting stats
 Pitching.csv       Comma-delimited data table of pitching stats
 Fielding.csv       Comma-delimited data table of fielding stats
 AllStars.csv       Comma-delimited data table of all-star data
 HOF.csv            Comma-delimited data table of hall of fame data
 Managers.csv       Comma-delimited data table of managerial data
 Teams.csv          Comma-delimited data table of team stats
 Awards.csv         Comma-delimited data table of award winners
 PostBatting.csv    Comma-delimited data table of post-season batting stats
 PostPitching.csv   Comma-delimited data table of post-season pitching stats
 readme.txt         This documentation file
 missing.txt        Documentation on data that is missing from the database


----------------------------------------------------------------------
1.1 Introduction

This database contains pitching, hitting, and fielding statistics for
Major League Baseball from 1871 through 1998.  It includes data from
the two current leagues (American and National), the four other "major" 
leagues (American Association, Union Association, Players League, and
Federal League), and the National Association of 1871-1875. 

None of what I have done would have been possible without the
pioneering work of Hy Turkin, S.C. Thompson, David Neft, and Pete
Palmer (among others).  All baseball fans owe a debt of gratitude
to the people who have worked so hard to build the tremendous set
of data that we have today.  My thanks also to the many members of
the Society for American Baseball Research who have helped me over
the years.  I strongly urge you to support and join their efforts.
Please vist their website (www.sabr.org).

This database is a result of many years of my work, and that work
will continue.  I have been making sets of this data available on
my web page for several years, but 1996 marked the first time I 
made the database itself available to the public.  I hope that 
others find it as useful and as informative as I have.

This database can never take the place of a good reference book like 
Total Baseball.  But it will enable people do to the kind of queries
and analysis that those traditional sources don't allow.

If you have any problems or find any errors, please let me know.  Any 
feedback is appreciated

----------------------------------------------------------------------
1.2 What's New

The following changes have been made for this version of the database
 - Updated all data tables through 1998 season
 - Added Player IDs to fielding table and managers table
 - Added tables with post-season batting and pitching stats
 - Added table of award winners
 - Added attendance, park factor, stadium info, and more to team stats

----------------------------------------------------------------------
1.3 Acknowledgements

The improvements made to this version of the database were made 
possible by the extraordinary efforts of three individuals.  Lee 
Sinins integrated the final 1998 stats within a few days of the 
seasons end, making it possible to have an update online before the
start of the League Championship Series.  Erik Greenwood spent a lot
of time this summer researching and compiling post-season data and
the awards table. John Northey helped updgrading and adding data to
existing tables, and provided a tremendous ammount of assistance in
pulling all of the pieces together.

Thanks to all of the people who provided feedback that made this
version of the database better and more accurate than before. I'd
especially like to thank Clifford Otto, Ted Nye, and Keith Woolner
for their assistance over the years.  Many others wrote in with 
corrections and suggestions that make each version so much better 
than anything that preceded it.

The work of the SABR Baseball Records Committee, led by Lyle Spatz
has been invaluable.  So has the work of Bill Carle and the SABR 
Biographical Committee.

Also thanks to the staff at the National Baseball Library
in Cooperstown who have been so helpful -- Tim Wiles, Jim Gates,
Corey Seeman, Scot Mondore, and others whose names I may have
forgotten.

And a special thanks to Dave Smith and the folks at Retrosheet.  
There is no other group working so hard to compile and share baseball
data.  Their website (www.retrosheet.org) will give you a taste of
the wealth of information Dave and the gang have collected.


----------------------------------------------------------------------
1.4 Registration

If you use this database, please use the online form to register.
There is no fee. This is completely FREE!  But by registering, it 
enables me to keep you updated when changes are made to the database.
You can register by sending me e-mail at sean@baseball1.com or by 
visiting my web page at: http://www.baseball1.com


----------------------------------------------------------------------
1.5 Using this Database - Generic Version

There are eleven CSV files with this release.  Each contains the
data that will constitute a table once imported into a database 
program.  The CSV (comma separated variables) format can be read
by most database applications that I'm aware of.  The first record
(or row) in each file contains the field names.  You will need to 
import each of these tables into your database application, then 
create a relationship linking the LahmanID field in each table to
the LahmanID field in the Master table.  You will then have fully
assembled the baseball database.

Section 2.0 of this document describes the organization of the tables
in more detail.  Sections 2.1 through 2.6 describe the fields within each
table.

I am not a database expert, but I will attempt to answer any general
usage problems that may arise.  Please e-mail me if you have problems
or questions.

----------------------------------------------------------------------
1.6 Revision History

     Version      Date            Comments
       1.0      December 1992     Database ported from dBase
       1.1      May 1993          Becomes fully relational
       1.2      July 1993         Corrections made to full database
       1.21     December 1993     1993 statistics added            
       1.3      July 1994         Pre-1900 data added 
       1.31     February 1995     1994 Statistics added
       1.32     August 1995       Statistics added for other leagues
       1.4      September 1995    Fielding Data added 
       1.41     November 1995     1995 statistics added
       1.42     March 1996        HOF/All-Star tables added
       1.5-MS   October 1996      1st public release - MS Access format
       1.5-GV   October 1996      Released generic comma-delimted files
       1.6-MS   December 1996     Updated with 1996 stats, some corrections
       1.61-MS  December 1996     Corrected error in MASTER table
       1.62     February 1997     Corrected 1914-1915 batters data and updated
       2.0      February 1998     Major Revisions-added teams & managers
       2.1      October 1998      Interim release w/1998 stats
       2.2      January 1999      New release w/post-season stats & awards added

------------------------------------------------------------------------------

2.0 Data Tables

The design follows these general principles.  Each player is assigned a
unique number (LahmanID).  All of the information relating to that player
is tagged with his LahmanID.  The LahmanIDs are linked to names and 
birthdates in the MASTER table.

The database is comprised of the following main tables:

  MASTER - Player names, DOB, and biographical info
  Batting - batting statistics
  Pitching - pitching statistics
  Fielding - fielding statistics

It is supplemented by these tables:

  AllStars - All-Star appearances
  HOF - Players in the Hall of Fame
  Managers - managerial statistics
  Teams - yearly stats and standings 
  Awards - award winners
  PostBatting - post-season batting statistics
  PostPitching - post-season pitching statistics

Sections 2.1 through 2.11 of this document describe each of the tables in
detail and the fields that each contains.

---------------------------------------------------------------------------
2.1 MASTER table



LahmanID       A unique number asssigned to each player.  The LahmanID
               links the data in this file with records in the other files.
LastName       Player's last name
FirstName      Player's first name
Bats           
Throws
BirthMonth     Month player was born
BirthDay       Day player was born
BirthYear      Year player was born
DebutYear      Year that player made first major league appearance
------------------------------------------------------------------------------
2.2 Batting Table

RecNum         A unique number to identify each record
LahmanID       Player ID Number
Year           Year
Team           Team
Lg             League
G              Games
AB             At Bats
R              Runs
H              Hits
TB             Total bases
2B             Doubles
3B             Triples
HR             Homeruns
RBI            Runs Batted In
SH             Sacrifice hits
SF             Sacrifice flies
SB             Stolen Bases
CS             Caught Stealing
BB             Base on Balls
IBB            Intentional walks
HPB            Hit by pitch
SO             Strikeouts
POS            Defensive Positions ranked by frequency of appearances
               C=Catcher, 1=Firstbaseman, 2=Second, 3= Thirdbaseman, 
               S=Shortstop, O=Outfield, D=Designated Hitter, H=Pinch-hitter
               only, R=Pinch-runner only, M=Manager

               Based on the standard notation first used by Total 
               Baseball. A preceding asterisk indicates that the player 
               was a starter at the first listed position.  A starter is
               defined as a player who played 100 games at that position 
               (or 2/3 of a teams scheduled games in seasons prior to 1900).
               Any position following a slash "/" indicates that the player 
               appeared in fewer than 10 games at that position.

------------------------------------------------------------------------------
2.3 Pitching table

RecNum         A unique number to identify each record
LahmanID       Player ID Number
Year           Year
Team           Team
Lg             League
W              Wins
L              Losses
G              Games
GS             Games Started
CG             Complete Games 
SH             Shutouts
SV             Saves
IP             Innings Pitched (using the non-traditional standard 
               of .3 for 1/3 inning and .7 for 2/3 inning)
H              Hits
ER             Earned Runs
HR             Homeruns
BB             Walks
SO             Strikeouts
ERA            Earned Run Average

------------------------------------------------------------------------------
2.4 Fielding Table

RecNum         A unique number to identify each record
LahmanID       Player ID Number
LName          Player's last name
FName          Player's first name
Pos            Defensive position
Year           Year
Team           Team
Lg             League
G              Games 
PO             Putouts
A              Assists
E              Errors
DP             Double Plays

------------------------------------------------------------------------------
2.5  All-Star table

RecNum         A unique number to identify each record
LahmanID       Player ID Number
Year           Year
Team           Team
Lg             League

------------------------------------------------------------------------------
2.6  HOF table

RecNum         A unique number to identify each record
LahmanID       Player ID Number
LastName       Player's last name
FirstName      Player's first name
Inducted       Year of induction
By             Method of Induction (BW=Baseball writers, VC=Veteran's 
               committee, NL=Commitee on the Negro Leagues
Ballots        Total ballots cast in year of induction
Votes          Total votes received
Pct            Percentage of votes received
Pos            Primary playing position
Category       Type of inductee
------------------------------------------------------------------------------
2.7  Managers table

RecNum         A unique number to identify each record
LahmanID       Player ID Number
Year           Year
Team           Team
Lg             League
Div            Division
G              Games managed
W              Wins
L              Losses
Pct            Winning percentage
Std            Team's final position in standings that year
Order          Managerial order.  Blank if the individual managed the team
               the entire year.  Otherwise denotes where the manager appeared
               in the managerial order (1 of 2, 2 of 2, etc.)
PlyrMgr        Player Manager (denoted by 'Y')
------------------------------------------------------------------------------
2.8  Teams table

RecNum         A unique number to identify each record
Year           Year
Lg             League
Pos            Position in final standings
Team           Team
G              Games played
W              Wins
L              Losses
Pct            Winning percentage
GB             Games behind
R              Runs scored
OR             Opponents runs scored
AB             At bats
H              Hits by batters
2B             Doubles
3B             Triples
HR             Homeruns by batters
BB             Walks by batters
SO             Strikeouts by batters
AVG            Batting average
OBP            On base percentage
SLG            Slugging percentage
SB             Stolen bases
CS             Caught stealing
ERA            Earned run average
CG             Complete games
SHO            Shutouts
SV             Saves
IP             Innings pitched
H-P            Hits allowed
HR-P           Homeruns allowed
BB-P           Walks allowed
SO-P           Strikeouts by pitchers
BPF            Three-year park factor for batters
PPF            Three-year park factor for pitchers
Ballpark Name  Name of team's home ballpark
Attendance     Home attendance total
------------------------------------------------------------------------------
2.9  Awards table

RecNum         A unique number to identify each record
LahmanID       Player ID Number
Award          Name of annual award
Tie            Indicates whether two or more individuals shared the award
Year           Year
Lg             League
Pos            Position for Gold GLove awards, note otherwise

------------------------------------------------------------------------------
2.10  PostBatting table

RecNum         A unique number to identify each record
LahmanID       Player ID Number
Year           Year
Team           Team
Lg             League
Playoff        Level of playoffs (WS = World Series, DIV = Divisional Series,
                 LCS = League Championship Series)
G              Games
AB             At Bats
R              Runs
H              Hits
2B             Doubles
3B             Triples
HR             Homeruns
RBI            Runs Batted In
BB             Base on Balls
SO             Strikeouts
SB             Stolen Bases

------------------------------------------------------------------------------
2.11  PostPitching table

RecNum         A unique number to identify each record
LahmanID       Player ID Number
Year           Year
Team           Team
Lg             League
Playoff        Level of playoffs (WS = World Series, DIV = Divisional Series,
                 LCS = League Championship Series)
W              Wins
L              Losses
SV             Saves
G              Games
CG             Complete Games 
IP             Innings Pitched (using the non-traditional standard 
               of .3 for 1/3 inning and .7 for 2/3 inning)
H              Hits
ER             Earned Runs
BB             Walks
SO             Strikeouts

------------------------------------------------------------------------------

<end of file>