Wednesday, January 21, 2009

Database Documentation using SchemaSpy

One of the largest challenges that developers and software project managers have a difficult time with is how to keep documentation in sync with the code elements.  While there are a large amount of tools that allows for documentation to be created based on code comments (i.e. nDoc, JavaDoc, SandCastle, etc.), there are not a lot of tools that do the same for Databases.  Now, there are tools that will document a database after the fact (or will create the database based on the documentation you're creating), many are very expensive or appear to be no longer support.  One tool that I have started using recently for database documentation was recommended in the book The Productive Programmer by Neal Ford.  The tool is called SchemaSpy.

 

For the sake of brevity, this post will cover how to generate database documentation for only Microsoft SqlServer 2000 and 2005 databases using SchemaSpy.

 

What is SchemaSpy?

SchemaSpy is a Java command-line application that will connect to your database and generate information about the tables and their relationship.  SchemaSpy does does not address any database programming elements (i.e. Functions, Triggers, Stored Procedures, etc.); however, supplies a large amount of information on the tables and views themselves along with their relationships.  It also evaluates your indexes and keys in order to build relational diagrams.  Furthermore, it retrieves information from the database tables in order to give you a view of the number of records, the types of columns, and a large amount of other information for it.  In addition to large amount of information, it also provides notification if there are any anomalies in the schema; such as implied relationships between columns (no foreign key constraints), tables without indexes, columns marked as both "null" and "must be unique", single column tables, and more.

 

What does SchemaSpy Produce?

Below are a few screen shots of the documentation that SchemaSpy creates for the Northwind database.

The Index page for the documentation SchemaSpy produces. 
Figure 1: SchemaSpy's Index page for the Northwind Database.

 

An example of the relational diagrams produced by SchemaSpy
Figure 2: An example of the the diagramed relationships SchemaSpy produces

 

What do I need to start using SchemaSpy?

SchemaSpy requires 4 separate components to be installed in order to work properly:

  • The SchemaSpy application itself (version 4.1.1 as of this post) - Download
  • The Graphviz Diagram Engine - Download
  • Java Runtime Environment (v1.5 or higher)  - Download
  • The Database Driver(s) required for a Java Application to connect to your database.
    • MS SqlServer 2000 / 2005 - JTDS v1.2 - Download

Once the 4 components are downloaded and the JRE and Graphviz are installed, the next thing we have to do is install/configure the JTDS drivers (more on why JTDS instead of traditional JDBC in a moment).  As of this post, SchemaSpy only works with the 1.2 version of the JTDS driver and requires the jtds-1.2.jar file to be in the C:\Program Files\Java\Shared\jtds directory.  The driver jar file can be stored in other directories; however, an additional command parameter has to be passed to SchemaSpy if you do (see below).  At this point, the only thing that's left is to execute SchemaSpy with the proper command line arguments. 

 

Running SchemaSpy

After all of the above components are installed, you are able to execute SchemaSpy from a command prompt using the following (general) syntax:

java -jar SchemaSpy_4.1.1.jar -t databaseType -host serverName -port portNumber -db dbName -u userName -p password -s schemaName -o outputDirectory

Let's dissect these command parameters a bit:

  • -t  : This identifies the type of the database we're using.  Since this post is focusing on MS SqlServer using the JTDS driver, the database type will be either mssql-jtds (for SqlServer 2000) or mssql05-jtds (for SqlServer 2005).
  • -host  :  This is required by our type.  It signifies what database server to connect to.
  • -port  :  This is required by our type.  This signifies what port to connect to the database.  This will not always be 1433 (see below).
  • -db  :  This tells SchemaSpy what database to diagram for us.
  • -u  : This tells SchemaSpy the database user name to login with.
  • -p  : This tells SchemaSpy the database user's password to login with.
  • -s  : This tells SchemaSpy what schema to read and diagram.  The user credentials provided must have access to at least read this schema.
  • -o  : This tells SchemaSpy the output directory to generate the documentation.
  • -dp  : This tells SchemaSpy the path to locate the database driver file (not used in this example)

An example of a fully completed command:

java -jar SchemaSpy_4.1.1.jar -t mssql-jtds -host myDBServer -port 1433 -db Northwind -u myDbUser -p password -s dbo -o NorthwindDocumentation

If your projects use Sql Server's integration authentication option, you can use the -sso option.  The -sso option allows you to remove both the username (-u) and password (-p) parameters and use the identity of the account used to called the Java Runtime Environment.  Modifying the example above to use integrated authentication, we get the following:

java -jar SchemaSpy_4.1.1.jar -t mssql-jtds -host myDBServer -port 1433 -bb Northwind -sso -s dbo -o NorthwindDocumentation

 

Why JTDS instead of the JDBC drivers?

As I mentioned above, I recommend using the JTDS Java Database drivers instead of the traditional JDBC drivers that Microsoft offers off of their web site.  While the fact that JTDS is a single driver as opposed to the 2 separate JDBC drivers is a plus, the main reason is the trouble I had getting the JDBC drivers to work with SqlServer 2005 databases.  If I was working on documenting a database I'm creating for a project on my local system using SQL Server 2005 Express Edition, it would always give me an error stating that the "variant" data type is not supported.  After a lot of research and trial and error, the only result I could only get this working by using the JTDS driver.

 

How to use SchemaSpy to Diagram a Named Instance

When installing Microsoft Sql Server, you have the option of installing it as a stand alone service or a named instance.  In the event of a named instance, you would normally connect to it in your .Net code (or via Management Studio/Query Analyzer) as ServerName\InstanceName.  For example, the default instance of a Sql Server 2005 Express Edition is localhost\SQLExpress.  In order to connect to a named instance database with SchemaSpy, you do not need the Instance Name of the database (just the server name as if it wasn't an instance), and you have to use the database's TCP Dynamic Port instead of the typical 1433.

You can locate a named instance's dynamic port through the following steps:

  1. Open the Sql Server Configuration Manager application (either through the Programs Menu or a MMC Snap In)
  2. Expand the Sql Server 2005 Network Configuration element
  3. Click on the Protocols for "INSTANCE" where "INSTANCE" is the name of your named instance
  4. In the right pane, double click on TCP/IP
  5. In the TCP/IP Properties dialog that appears, click on the IP Addresses tab.
  6. Towards the bottom of the IP Addresses tab, you'll find the TCP Dynamic Ports value to use for SchemaSpy

The above steps were taken based on a local installation of SQL Server 2005 Express Edition.  Your steps may vary depending upon configuration.

 

Conclusion

SchemaSpy is a tool that I have been using to help with some of the documentation pains that I have experienced on some of my personal and professional projects.  Being able to add the command into a build script to automatically keep the documentation up to date is a huge time saver and is invaluable as people migrate onto the project(s).  I hope you find it as useful as I have, and to assist further, I have added a simple step by step guide to setting up and begin using SchemaSpy below. 

 

The Step-By-Step

  1. Install the Java Runtime Environment version 1.5 or higher if it is not already installed.
  2. Install Graphviz Diagramming Engine.
  3. Download the SchemaSpy jar file.
  4. Download the JTDS v1.2 driver and place the jar file in the C:\Program Files\Java\Shared\JTDS directory.
  5. Obtain User Login Credentials with the proper permissions to read the schema for the database you will be documenting if integrated authentication is not available.
  6. Obtain the TCP Dynamic Port if the database you are accessing is a named instance.
  7. Open a command window and navigate to the location of the SchemaSpy jar file.
  8. Execute the SchemaSpy jar file using the proper command line parameters.


kick it on DotNetKicks.com Shout it

22 comments:

  1. Great post! I will have to try this out on my next project.

    -David

    ReplyDelete
  2. Why does it seem that all DB tools are built using Java?

    ReplyDelete
  3. Nice post. documented one server, 4 databases, 15 schemas very quickly.

    ReplyDelete
  4. That was a great post. I am able to generate the documentation using the steps that u had suggested however no database diagrams are being generated. I get a "Failed to query Graphviz version information: java.io.IOException: Cannot run program "dot": CreateProcess error=2" error. Any idea what I could be doing wrong??

    ReplyDelete
  5. never mind. worked by adding the graphwiz dot program in my PATH.

    ReplyDelete
  6. remember to add the jtds-1.2.2-dist\x86\SSO\ntlmauth.dll to your java runtime directory as well if you're going to use SSO

    ReplyDelete
  7. I can't produce schema for:
    java -jar schemaSpy.jar -t sybase -db dbname -host 10.10.10.102 -port 5000 -u sa -o e:\schema -s dbo

    I'm getting this error:
    InvalidConfigurationException: Unexpected named parameter ':table')' found in SQL 'select rowcnt row_count from systabstats where id = object_id(':table') and indid in (0,1)'

    ReplyDelete
  8. @Pepe

    I'm not too familiar with Sybase; however, you can contact John Currier over at SourceForge and he may be able to help you out. He's the primary developer and coordinator for the project and should be able to get it fixed for you if it's truly a bug.


    Contact John Currier via SourceForge.net

    ReplyDelete
  9. thanks for this wonderful blog... Can I use Schema spy to run against sql server 2008 ?

    Hui

    ReplyDelete
  10. @Hui Shi

    That's a great question. I'll run some tests to see what it'll take to connect to Sql 2008 and post the information here or in a follow up post.

    ReplyDelete
  11. Thank you for publishing such an extremely well written article. I had no idea that SchemaSpy had been referenced in another book!

    @Pepe

    That bug was resolved in Subversion revision 535. I need to get my act together and publish a new release, but in the meantime you can grab the latest beta version at http://schemaspy.sourceforge.net/schemaSpy.jar

    ReplyDelete
  12. I am unable to log in. Does the program support when you need to specify a domain? Like "-u DOMAIN\user"? I was thinking that perhaps the / was causing problems or something...

    Failed to connect to database URL [jdbc:jtds:sqlserver://localhost:1433/NTEBBSBS1]

    java.sql.SQLException: Login failed for user 'MYDOMAIN\administrator'.

    ReplyDelete
    Replies
    1. yes, need to supply domain info using this :

      -connprops=domain\=MYDOMAIN

      Delete
  13. Does the JTDS FAQ help? Sorry, but I don't have much experience with JTDS.

    John

    ReplyDelete
  14. Hi, what permissions does the user need to have. I have a user with SELECT privileges to the data dictionary, but that seems insufficient:

    $ java -jar schemaSpy.jar -dp jtds-1.2.5.jar -t sources/daily/fund_master.properties -u ddict_ro -p ddict_ro -o data/fund_master -norows -s fund_master
    Using database properties:
    sources/daily/fund_master.properties
    Connected to Microsoft SQL Server - 09.00.4053

    Gathering schema details...
    Failed to retrieve table comments: java.sql.SQLException: Invalid object name 'sysproperties'.
    SELECT i_s.TABLE_NAME, s.value AS comments FROM INFORMATION_SCHEMA.Tables i_s INNER JOIN sysproperties s ON s.id = OBJECT_ID(i_s.table_catalog + '..' + i_s.table_name) WHERE s.type = 3 AND s.name = 'MS_Description'


    No tables or views were found in schema 'fund_master'.
    The schema does not exist in the database.
    Make sure that you specify a valid schema with the -s option and that
    the user specified (ddict_ro) can read from the schema.
    Note that schema names are usually case sensitive.

    13 schemas exist in this database.
    Some of these "schemas" may be users or system schemas.

    db_accessadmin db_backupoperator db_datareader db_datawriter db_ddladmin db_denydatareader db_denydatawriter db_owner db_securityadmin dbo guest INFORMATION_SCHEMA sys
    These schemas contain tables/views that user 'ddict_ro' can see:

    INFORMATION_SCHEMA sys

    ReplyDelete
  15. Your section regarding JTDS vs. JDBC saved my life. I was using JDBC and kept getting the "variant data type is not supported" error. That error went away when I switched to JTDS.

    ReplyDelete
  16. Maybe you will find interesting another blog post about creation of diagrams out of .sql files (with use of SchemaSpy, Groovy and HSQLDB), see http://tinyurl.com/27tbw6w

    --
    Cheers,
    Tomek Kaczanowski

    ReplyDelete
  17. Thanks so much for this info.

    However I am stuck when trying to connect to a SQL Server 2008.

    It would be very nice if somebody has a clear answer for this: Is SchemaSpy supported for SQL Server 2008?

    I am using SchemaSpy 5 to try to obtain the diagram of an SQL Server 2008 with no luck.

    I am using the JTDS 1.2.5 drivers and Schema Spy 5.0 and Java 1.6.0.21

    The command:

    java -jar "D:\Program Files\SchemaSpy\schemaSpy_5.0.0.jar" -dp "D:\jtds-1.2.5-dist\jtds-1.2.5.jar" -t mssql-jtds -db EnterpriseVaultDirectory -host localhost -port 1433 -u sa -connprops "D:\Program Files\SchemaSpy\properties\gonzalo.properties" -o "D:\Schema"

    I always obtain the following exit:

    Failed to connect to database URL [jdbc:jtds:sqlserver://localhost:1433/EnterpriseVaultDirectory]

    java.sql.SQLException: Login failed for user 'sa'.
    at net.sourceforge.jtds.jdbc.SQLDiagnostic.addDiagnostic(SQLDiagnostic.java:368)
    at net.sourceforge.jtds.jdbc.TdsCore.tdsErrorToken(TdsCore.java:2820)
    at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2258)
    at net.sourceforge.jtds.jdbc.TdsCore.login(TdsCore.java:603)
    at net.sourceforge.jtds.jdbc.ConnectionJDBC2.(ConnectionJDBC2.java:345)
    at net.sourceforge.jtds.jdbc.ConnectionJDBC3.(ConnectionJDBC3.java:50)
    at net.sourceforge.jtds.jdbc.Driver.connect(Driver.java:184)
    at net.sourceforge.schemaspy.SchemaAnalyzer.getConnection(SchemaAnalyzer.java:582)
    at net.sourceforge.schemaspy.SchemaAnalyzer.analyze(SchemaAnalyzer.java:157)
    at net.sourceforge.schemaspy.Main.main(Main.java:42)
    E=3I=3

    No matters if I change the user / password.
    The port it is correct

    Any help will be very welcomed

    Thanks in advance

    ReplyDelete
  18. (The below suggestions were provided via an email conversation. I figured I'd post them here regardless though for readers.)

    Check to make sure you can connect to the database with the "sa" account in Sql Server 2008 Management studio. Normally, using the "sa" account isn't recommended for security purposes so another alternative would be to create a different user account for the database and connect using it. Another option is to try to use single sign on (-sso) instead of supplying a username and password.

    ReplyDelete
  19. The command "dot" is working find (dot is in my PATH) and "dot -V" shows the result:
    "dot - graphviz version 2.26.3 (20100126.1600)"
    But when a execute schemaspy ("java -jar schemaSpy_5.0.0.jar -t pgsql -db prod -u postgres -s bau -o html/ -dp postgresql-9.0-801.jdbc3.jar")
    , it always give me an error message:
    "Failed to query Graphviz version information
    with: /usr/bin/dot -V
    java.lang.NullPointerException"
    Any idea?

    ReplyDelete
  20. Unfortunately I can't reproduce the behavior you're seeing when I use your version details. If you run the beta version (http:/schemaspy.sourceforge.net/schemaSpy.jar) with -loglevel=info then it will dump the full stack trace associated with the exception to help track it down.

    Note that it would be much better to report/resolve problems directly on the SchemaSpy bug tracker.

    John
    http:/schemaspy.sourceforge.net

    ReplyDelete
  21. If you need to re-do database diagrams on a regular basis, I've found a GUI that stores all of the parameters ...
    http://schemaspygui.sourceforge.net/

    I've found it to come in pretty handy, once I figure out all of the parameters that I need/want to pass.

    ReplyDelete