Improve your Database Design with Salix Babylonica

Weeping Willow Tree


NAME

salix - Salix Babylonica: advices about relational database schemas.


SYNOPSIS

salix (create|report|show|drop) (pg|my) <database connection options...>


DESCRIPTION

salix gives design, style, consistency, version and system advices about relational database schemas by querying the standard information schema. The results are put in a salix schema which can be queried with your favorite database relation browsing tool. This command is a simple front-end to run Salix Babylonica SQL scripts over the target database.


ARGUMENTS

When started without any argument, the salix command displays a short help message and the running version. Otherwise, two arguments are expected.

The first argument is the operation to perform.

create

Creates the Salix Babylonica salix schema which holds the advices, as well as salix_information_schema, a normalized copy of the information schema.

report

Generate a simple text report for the raised advices, sufficient to identify where the issues are in the schemas. This operation requires a prior run of create on the database.

show

Show a basic summary of the advices. Use you favorite database browser to explore the salix schema for detailed information, and possibly to fix the database schemas. This operation requires a prior run of create on the database. Use report for more textual details about the raised advices.

drop

Drop Salix Babylonica-related schemas.

The second argument is the target database engine.

pg - PostgreSQL

PostgreSQL does have real catalogs named DATABASEs, which contain several SCHEMAs. Catalogs are isolated one from the other, thus it is necessary to run the tool separately on each catalog.

Other arguments can be used to select PostgreSQL: psql, pgsql, postgres, postgresql.

my - MySQL

MySQL does not have catalogs, as the DATABASE is really a SCHEMA (a kind of directory which stores tables), thus it is enough to run the tool only once against an installation.

Argument mysql can be used to select MySQL as well.

The remainder arguments are directly passed to the database connection command. The underlying permissions of the connection must allow to read the information schema meta-data system tables, and to create the two Salix Babylonica-related PostgreSQL schemas/MySQL databases.

The rational for the argument order is that from the database selection argument onward this is the full connection command to be used by salix.


EXAMPLES

With PostgreSQL:

  sh> salix create pg -U calvin -h server erp
  sh> salix report pg -U calvin -h server erp
  <look at issues...>
  sh> salix drop pg -U calvin -h server erp

With MySQL:

  sh> salix create my -u calvin -h server
  sh> salix report my -u calvin -h server
  <look at issues...>
  sh> salix drop my -u calvin -h server


EXIT STATUS

The command returns the 0 status code if all was okay.

Other error codes are returned by the script:

  1. expecting at least 2 arguments

  2. error on operation name, salix first argument

  3. error on database name, salix second argument

  4. directory for SQL scripts not found

  5. cannot cd to SQL script directory

  6. script for operation on database not found

Finally, error codes can also be returned by the database client command when running the SQL script. See the corresponding documentation for details.


LIST OF ADVICES

A summary of all the advices raised can be found in table salix.summary_of_advices: For each concerned_schema (PostgreSQL schema or MySQL database) and advice title, it shows the advice category, severity, level, abstract, description, holding relation view_name, and nb the total number of occurrences.

The detailed list of Salix Babylonica advices ordered by severity, category and level is:

Schema without any FK (schema design error)

Advice in relation salix.da_schema_wo_FK

Why use a relational database if data are not related at all? Well, that might happen...

No attribute in table (table design error)

Advice in relation salix.da_no_attribute_table

There must be something in a table.

Tables without PK nor Unique (table design error)

Advice in relation salix.da_tables_wo_p_key_and_unique

All tuples must be uniquely defined to be consistant with the set theory. There is no unique subset of attribute which can be promoted as a PK.

FK length mismatch (table consistency error)

Advice in relation salix.da_f_key_match_b_diff_sizes

A Foreign Key should have matching referencing and referenced type sizes.

FK type mismatch (table consistency error)

Advice in relation salix.da_f_key_type_dont_match

A Foreign Key should have matching referencing and referenced types.

Backend engine inconsistency (schema version error)

Advice in relation salix.dbva_engine_inconsistency

Different backends are used in the same database. It may be legitimate to do so if a particular feature of one backend is needed, for instance full text indexes.

SuperUser with weak password (user system error)

Advice in relation salix.sa_trivial_password_superuser

SuperUser with empty or username password.

Nullable attribute rate over 80% (schema design warning)

Advice in relation salix.da_nullable_attribute_rate_too_high

Warning: Most of the time, attributes should be NOT NULL. Too high a rate of nullable attribute may reveal that some fields are lacking a NOT NULL.

Attribute count per table over 40 (table design warning)

Advice in relation salix.da_attribute_count_per_table_too_many

Having so many attributes in the same table may reveal the need for additional relations.

Composite Foreign Key (table design warning)

Advice in relation salix.da_tables_w_composite_f_key

As for primary keys, simple foreign keys are usually better design, and make updates easier.

FK not referencing a PK (table design warning)

Advice in relation salix.da_f_key_not_ref_p_key

A Foreign Key should rather reference a Primary Key.

Integer PK but no other key (table design warning)

Advice in relation salix.da_int_pk_but_no_other_key

A simple integer primary key suggests that some other key must exist in the table.

Isolated Tables (table design warning)

Advice in relation salix.da_isolated_table

In a database design, tables are usually linked together.

Large PK referenced by a FK (table design warning)

Advice in relation salix.da_large_p_keys_referenced_by_f_key

Having large primary keys referenced by a foreign key may reveal data duplication, as the primary key is likely to contain relevant information.

Tables without PK but with Unique (table design warning)

Advice in relation salix.da_tables_wo_primary_key_but_unique

All tables should have a primary key to be consistant with the set theory. A unique constraint may be promoted as the primary key.

Attribute has a pseudo 'NULL' text default (attribute design warning)

Advice in relation salix.da_attribute_pseudo_null_default

Possibly the NULL value was intended instead of the 'NULL' text.

Unique nullable attributes (attribute design warning)

Advice in relation salix.da_unique_nullable_attribute

A unique nullable attribute may be a bad design if NULL does not have a particular semantic.

Destination table and FK in different schemas (table consistency warning)

Advice in relation salix.ca_cross_schema_dst_constraints

A constraint and its destination table are usually in the same schema.

Source table and constraint in different schemas (table consistency warning)

Advice in relation salix.ca_cross_schema_src_constraints

A constraint and its source table should be in the same schema.

Table and index in different schemas (table consistency warning)

Advice in relation salix.ca_cross_schema_indexes

An index and its table should be in the same schema.

Unsafe backend engine used in schema (schema version warning)

Advice in relation salix.dbva_schema_engine_is_unsafe

An unsafe backend (e.g. MyISAM) used at least once lacks referential integrity, transaction support, and is not crash safe.

Schema name length too short (schema style warning)

Advice in relation salix.da_schema_identifiers_length_too_short

A schema name with less than 3 characters is really too short.

Table name length too short (table style warning)

Advice in relation salix.da_table_identifiers_length_too_short

A table name with less than 2 characters is really too short.

Attribute name length too short (attribute style warning)

Advice in relation salix.da_attribute_identifiers_length_too_short

An attribute name with 1 character is really too short.

Homonymous heterogeneous attributes (attribute style warning)

Advice in relation salix.da_attribute_same_name_diff_types

Better avoid using the same attribute name with different types on different tables in the same application, as it may confuse the developer.

Redundant FK (table system warning)

Advice in relation salix.sa_redundant_foreign_keys

Redundant Foreign Keys are costly to maintain.

Redundant indexes (table system warning)

Advice in relation salix.sa_redundant_indexes

Redundant indexes are costly to maintain.

User with weak password (user system warning)

Advice in relation salix.sa_trivial_password_user

User with empty or username password.

Nullable attribute rate in 50-80% (schema design notice)

Advice in relation salix.da_nullable_attribute_rate_quite_high

Notice: Most of the time, attributes should be NOT NULL. Too high a rate of nullable attribute may reveal that some fields are lacking a NOT NULL.

Attribute count per table over 20 (table design notice)

Advice in relation salix.da_attribute_count_per_table_many

Having many attributes in the same table may suggest the need for additional relations.

Non-integer Primary Key (table design notice)

Advice in relation salix.da_non_integer_p_key

Having integer primary keys without specific application semantics make updates easier.

Table with a single attribute (table design notice)

Advice in relation salix.da_single_attribute_table

Possibly some more attributes are needed to have a semantic.

Tables with Composite PK (table design notice)

Advice in relation salix.da_tables_w_composite_p_key

A simple primary key, without specific semantics, is usually a better design, and references through foreign keys are simpler.

Tables linked but in different schemas (table consistency notice)

Advice in relation salix.ca_cross_schema_tables

Linked tables are usually in the same schema.

MySQL is used (database version notice)

Advice in relation salix.dbva_database_is_mysql

MySQL lacks important features of the SQL standard, including missing set operators.

Mixed table name styles (schema style notice)

Advice in relation salix.da_table_names_mixed_style

Better use homogeneous table names.

Schema name length quite short (schema style notice)

Advice in relation salix.da_schema_identifiers_length_quite_short

A schema name with 4 characters is quite short.

Mixed attribute name styles (table style notice)

Advice in relation salix.da_attribute_names_mixed_style

Better use homogeneous attribute names.

Table name length quite short (table style notice)

Advice in relation salix.da_table_identifiers_length_quite_short

A table name with 3 characters is quite short.

Attribute name length quite short (attribute style notice)

Advice in relation salix.da_attribute_identifiers_length_quite_short

An attribute name of 2 characters is quite short (but "id" and "pk").

Attribute named after its table (attribute style notice)

Advice in relation salix.da_attribute_with_table_prefix

An attribute contains the name of its table, which is redundant.

Table without index (table system notice)

Advice in relation salix.sa_table_without_index

Not a single index on a table.

Nullable attribute rate in 20-50% (schema design information)

Advice in relation salix.da_nullable_attribute_rate_high

Information: Most of the time, attributes should be NOT NULL. Too high a rate of nullable attribute may reveal that some fields are lacking a NOT NULL.

Unsafe backend engine used on table (table version information)

Advice in relation salix.dbva_table_engine_is_unsafe

An unsafe backend (e.g. MyISAM) lacks referential integrity, transaction support, and is not crash safe.

Schema name length short (schema style information)

Advice in relation salix.da_schema_identifiers_length_short

A schema name with 5 characters is short.

Table name length short (table style information)

Advice in relation salix.da_table_identifiers_length_short

A table name with 4 characters is short.

Attribute name length short (attribute style information)

Advice in relation salix.da_attribute_identifiers_length_short

An attribute name with 3 characters is short.


ENVIRONMENT

Some environment variables have a direct impact on the sotfware:

SLXBBL_HOME

Where to look for Salix Babylonica SQL files. A default is set when installing Salix, so this variable should not be necessary but when testing or if the files have been moved after the installation.

PATH

Salix Babylonica launches either the mysql or psql database clients to execute some SQL scripts. They are found from the default search path.

Other environment variables might affect the database-specific scripting command launched by the tool. Check the database command documentation for details.


CAVEATS

All softwares have bugs, this is a software, hence...

Beware that this software may destroy all your data, make you lose your hairs, or have any other unexpected side effect.

Do not run the software if you have a MySQL database or a PostgreSQL schema named salix or salix_information_schema, as it could be messed up or even destroyed in the process.

There is now way to restrict the script to report issues only on a subset of schemas from the command line. Filter out the results instead.

The software is named after the Latin word for the weeping willow tree, a.k.a. saule pleureur in French. It has nothing to do with the numerous companies with salix in their names.


LICENCE

GNU GPLv3

Copyright 2008-2012 Alexandre Aillos, Samuel Pilot, Shamil Valeev, Fabien Coelho <salix dot babylonica at coelho dot net>

Salix Babylonica is free software, both inexpensive and with sources. The GNU General Public Licence v3 applies, see http://www.gnu.org/copyleft/gpl.html for details.

The summary is: you get as much as you paid for, and we are not responsible for anything.


REFERENCE

The Salix software has been used to survey relational schemas in open-source software. The survey covers 512 schemas. The results of this study have been published in:

On the Quality of Relational Database Schemas in Open-source Software by Fabien Coelho, Alexandre Aillos, Samuel Pilot and Shamil Valeev. In International Journal on Advances in Software, 2011 vol 4, no 3&4, February 2012. Online at http://www.iariajournals.org/software/soft_v4_n34_2011_paged.pdf.

The above paper is an extended version of:

A Field Analysis of Relational Database Schemas in Open-source Software by Fabien Coelho, Alexandre Aillos, Samuel Pilot and Shamil Valeev. In Third International Conference on Advances in Databases, Knowledge and Data Applications (DBKDA), pp 9-15, St Maarten, The Netherlands Antilles, January 2011. ISBN: 978-1-61208-002-4. Copyright IARIA 2011. Online at http://www.thinkmind.org/index.php.

Note that this tool is inspired by pg-advisor, a proof-of-concept PostgreSQL-specific prototype developed in 2004-2005 by Fabien Coelho.


DOWNLOAD

The latest version of the software should be available from http://www.coelho.net/salix/#download.

Download this version as http://www.coelho.net/salix/slxbbl-1.0.0.tgz.


CHANGES

History of changes and versions:

version 1.0.0 on 2012-01-27 (revision 3084)

Improved documentation. Comments added and improved error handling in salix script. Exchange salix arguments order, so that the operation is the first argument, as it seems more logical. The arguments are possibly silently exchanged for backward compatibility. Derive copyright year automatically from svn commit. Nicer report output for PostgreSQL.

version 0.9.0 on 2011-03-22 (revision 2839)

Minor improvements in some advices. Add explanation string to all advices. Abort on error in PostgreSQL scripts. Add report command. Improved documentation.

version 0.8.0 on 2010-11-13 (revision 2594)

More advices, including 3 from SchemaSpy http://schemaspy.sourceforge.net/. Partial work around small issue in PostgreSQL information_schema.

version 0.7.1 on 2010-08-18 (revision 2201)

One more advice. Minor fix on one advice for excluding some schemas.

version 0.7.0 on 2010-06-29 (revision 2154)

More advices, about weak passwords, indexes and object naming styles. Some bug fixes.

version 0.6.2 on 2009-06-10 (revision 1808)

Check more MySQL backend engines. Improved documentation.

version 0.6.1 on 2009-04-13 (revision 1768)

Fix string literal escapes for MySQL. Improved documentation.

version 0.6.0 on 2009-04-12 (revision 1746)

Add two system advices about weak passwords for PostgreSQL. Add Id keyword to all source files. Use version in tar directory.

version 0.5.0 on 2008-12-10 (revision 1538)

A few bug fixes. One new advice.

version 0.4.0 on 2008-09-21 (revision 1188)

Better documentation. Bug fixes.

version 0.3.0 on 2008-09-01 (revision 1109)

Greatly improved documentation. Better schema names.

version 0.2.0 on 2008-08-31 (revision 1086)

Improved documentation. Rely on sh instead of bash.

version 0.1.0 on 2008-08-30 (revision 1063)

Initial version.