DATA MODELING WITH ENTITY-RELATIONSHIP DIAGRAM

 

 Entity Types

 

 Entity-relationship modeling is a logical representation of the data for an organization or for a business area.  Its popularity stems from factors such as relative ease of use, widespread  Computer Aided Systems Engineering (CASE) tool support, and the belief that entities and relationships are natural modeling concepts in the real world.  E-R models are described by graphical diagrams.  This technique was developed by  Peter Chen to serve as a tool for communications between designers and users.
 

Welcome to the home page of Dr. Peter Chen at Louisiana State University where he holds the position of M. J. Foster Distinguished Chair Professor of Computer Science since 1983.  Dr. Chen is the originator of the Entity-Relationship Model (ER Model), which serves as the foundation of many systems analysis and design methodologies, computer-aided software engineering (CASE) tools, and repository systems including IBM's Repository Manager/MVS and DEC's CDD/Plus. 

To learn to use his method, click here:     Drawing E-R drawings    E-R diagrams is a graphical representation of an entity-relationship model, created to understand user requirements.  Entities are enclosed in rectangles, and attributes are enclosed in ovals.  The key attribute is underlined.  The enclosing line is a double line in case of multivaried attributes and broken in case of derived attributes.  There are lines connecting the entity to its attributes.  Component attributes are connected to composite attributes.  Relationships are shown in diamonds, which could have its own attributes.

Entities can be further classified.
A Strong entity type is an entity that exists independently of other types.  Instances of a strong entity type always have a unique characteristic (called an identifier) that is, an attribute or combination of attributes that uniquely distinguish each occurrence of that entity.  A Weak entity type is an entity type whose existence depends on some other type.  It has no business meaning and is not needed in the E-R diagram without the entity on which it depends.  The entity type on which the weak entity type depends is called the identifying owner (or simply owner for short).  It does not have its own identifier.  It has a partial identifier attribute.  A full identifier may be formed by combining the partial identifier with the identifier of its owner.  For example, a DEPENDENT is a weak entity type.  Its owner is EMPLOYEE, a strong entity type with its identifier Employee_ID.  Weak entity types are enclosed in double rectangles.  The relationship between a weak entity and its owner is enclosed in a double diamond.

For an example of entity relationships, click here:    E-R diagrams

Weak Entity Sets

Entity sets that have no primary key, because there are not sufficient attributes, are called weak entity sets.
Entity sets that have a primary key are called strong entity sets.
There must be a discriminator or partial key in a the weak entity set that allows the distinction of which members of the weak
entity set relate to an entity in the strong entity set.
The identifying dominant entity set is said to own the weak entity set that it identifies.

 Object-Oriented Data Models
 

Controlling Data Integrity:

Data Integrity Controls are controls on the possible value(s) a field can assume.  It is a form of Validation.  They can be built into the physical structure of the fields and enforced by the DBMS.   Some forms are the type of data (numeric or character) and length of a field.  Other examples are:

Default value.             The value a field will assume unless a user enters an explicit value.  This reduces data entry times, since it can be skipped, and it reduces entry errors for common values.

Range Control.            Limits the set of permissible values a field may assume.  It may be numeric lower to upper bound or a set of specific values.  It may change with time.  Y2K problem arose because the year field was limited from 00 to 99. 

Null Control Values.   A null value is an empty value.  Primary keys prohibit null values.  Other required fields may also have a null value control placed on it.  For example Names should not be null.  Some fields such as middle initial may be null.

Referential Integrity.   Referential integrity on a field is a form of range control in which the value of that field must exist as the value in some field in another row of the same of different table.  It only guarantees that some existing cross-referencing value is used, not necessarily the correct one.

Handling Missing Data:

1.                  Use a default value

2.                  Prohibit null values

3.                  Substitute an estimate of the missing value.  If a sales value is missing when computing monthly product sales, use a formula involving the mean of the existing monthly sales values for that product indexed by total sales for that month across all products.  Mark such estimates to warn users.

4.                  Track missing data by setting up a trigger (a routine that will automatically execute when some event occurs or a time period passes).  One trigger could log the missing entry to a file and another trigger runs periodically to create a report of the contents of this log file.

5.                  Perform sensitivity testing so that missing data are ignored unless results are significantly changed; if a student is already getting an “A”, the result of the last quiz is immaterial.

INTEGRITY CONSTRAINTS

The major types of integrity constraints (whose purpose is to facilitate maintaining the accuracy and integrity of data) are:  domain constraints, entity integrity, referential integrity, and operational constraints.

Domain Constraints

A domain is a set of values that may be assigned to an attribute.  A domain definition usually consists of domain name, meaning, size (or length), and allowable values or range.  All of the values that appear in a column or relation must be taken from the same domain.

Entity Integrity

Rule designed to assure that every relation has a primary key, and that the data values for that primary key are all valid and non-null.  It is possible that a particular non-primary key attribute cannot be assigned a data value if there is no applicable value, or the applicable data value is not known.  Example fax number (not applicable) or a forgotten telephone number.   In such cases, a null value may be assigned.  Note that a null value is not zero or a string of blanks.  Inclusion of nulls could sometimes lead to anomalous result.

Referential Integrity

Associations between tables are defined through the use of foreign keys.  For example, the association between the CUSTOMER and DRIVER tables is defined by including the Driver_ID attribute as a foreign key in CUSTOMR.  So, before inserting a new row in the CUSTOMER table, the driver for that order must exist in the DRIVER table.  A referential integrity constraint is a rule that maintains consistency among the rows of two relations.  It states that if there is a foreign key in one relation, either each foreign key value must match a primary key value in the other relation or else the foreign key value must be null.

Operational Constraints

Example:   “A person may purchase a ticket for the all-star game only if that person is a season-ticket holder.”

Managing data security, privacy, and integrity:        Protecting the security, privacy, and integrity of the organizational databases rests with the database administration function. 

USING AND SELECTING INDEXES

Database manipulations require locating a row (or collection of rows) that satisfy some conditions.  Example, all customers in a given zip code, or all students with a particular major.  Scanning every row in a table looking for the desired rows is extremely slow.   Using indexes can speed this process, and defining indexes is an important part of physical database design.  Indexes on a file can be created for either a primary key or a secondary key or both.  The index is a table with two columns:  the key and the address of the record(s) that contain that key value.  For any primary key, there will be only one entry in the index for each key value.

Creating a Primary Key Index

The Driver table has a primary key of Driver_ID.  A primary key index using SQL command is:

            CREATE UNIQUE INDEX DRIVINDEX ON DRIVER(DRIVER_ID)

DRIVINEX is the name of the index file created to store the index entries.  The ON clause specifies which table is being indexed and the column(s) that form the index key.  The UNIQUE keyword makes the DRIVINDEX a primary key index.  When this command is executed, any existing records in the driver table would be indexed.  If there are duplicate values of Driver_ID, the CREATE INDEX command will fail.  Once the index is created, the DBMS will reject any insertion or update of data in the DRIVER table that would violate the uniqueness constraint on DRIVER_Ids.  For composite primary keys, list all the elements of the primary key in the ON clause.

Creating a Secondary Key Index

To retrieve rows of a relation based on values for attributes other than the primary key:

            CREATE INDEX DESCINDX ON PRODUCT(DESCRIPTION) is a SQL command to create an index on the Description field of the Product table.  The term UNIQUE should not be used with secondary key attributes, as values may be repeated. 

When to Use Indexes

There is a trade-off between improved performance for retrievals, and degraded performance for inserting, deleting, and updating the records in a file.