DATA MODELING WITH ENTITY-RELATIONSHIP DIAGRAM
|
|
Entity-relationship
modeling is a logical representation of the data for an organization or for
a business area. Its popularity stems from factors such as relative ease
of use, widespread Computer
Aided Systems Engineering (CASE) tool support, and the belief that entities
and relationships are natural modeling concepts in the real world. E-R
models are described by graphical diagrams. This technique was developed
by Peter Chen to serve as a
tool for communications between designers and users.
|
|
Welcome to the home page of Dr. Peter Chen at Louisiana State University where he holds the position of M. J. Foster Distinguished Chair Professor of Computer Science since 1983. Dr. Chen is the originator of the Entity-Relationship Model (ER Model), which serves as the foundation of many systems analysis and design methodologies, computer-aided software engineering (CASE) tools, and repository systems including IBM's Repository Manager/MVS and DEC's CDD/Plus. |
To learn to use his method, click here: Drawing E-R drawings E-R diagrams is a graphical representation of an entity-relationship model, created to understand user requirements. Entities are enclosed in rectangles, and attributes are enclosed in ovals. The key attribute is underlined. The enclosing line is a double line in case of multivaried attributes and broken in case of derived attributes. There are lines connecting the entity to its attributes. Component attributes are connected to composite attributes. Relationships are shown in diamonds, which could have its own attributes.
Entities can be further classified.
A Strong entity type is an entity that exists independently of other
types. Instances of a strong entity type always have a unique
characteristic (called an identifier) that is, an attribute or
combination of attributes that uniquely distinguish each occurrence of that
entity. A Weak entity type is an entity type whose existence
depends on some other type. It has no business meaning and is not needed
in the E-R diagram without the entity on which it depends. The entity
type on which the weak entity type depends is called the identifying owner (or
simply owner for short). It does not have its own
identifier. It has a partial identifier attribute. A full
identifier may be formed by combining the partial identifier with the
identifier of its owner. For example, a DEPENDENT is a weak entity
type. Its owner is EMPLOYEE, a strong entity type with its identifier
Employee_ID. Weak entity types are enclosed in double rectangles.
The relationship between a weak entity and its owner is enclosed in a double
diamond.
For an example of entity relationships, click here: E-R diagrams
Weak Entity Sets
Entity sets that have no primary key, because there are not sufficient
attributes, are called weak entity sets.
Entity sets that have a primary key are called strong entity sets.
There must be a discriminator or partial key in a the weak entity set that
allows the distinction of which members of the weak
entity set relate to an entity in the strong entity set.
The identifying dominant entity set is said to own the weak entity set that it
identifies.
Controlling Data Integrity:
Data Integrity Controls are controls on the possible value(s) a field can assume. It is a form of Validation. They can be built into the physical structure of the fields and enforced by the DBMS. Some forms are the type of data (numeric or character) and length of a field. Other examples are:
Default value. The value a field will assume unless a user enters an explicit value. This reduces data entry times, since it can be skipped, and it reduces entry errors for common values.
Range Control. Limits the set of permissible values a field may assume. It may be numeric lower to upper bound or a set of specific values. It may change with time. Y2K problem arose because the year field was limited from 00 to 99.
Null Control Values. A null value is an empty value. Primary keys prohibit null values. Other required fields may also have a null value control placed on it. For example Names should not be null. Some fields such as middle initial may be null.
Referential Integrity. Referential integrity on a field is a form of range control in which the value of that field must exist as the value in some field in another row of the same of different table. It only guarantees that some existing cross-referencing value is used, not necessarily the correct one.
Handling Missing Data:
1. Use a default value
2.
Prohibit null values
3.
Substitute an estimate of the missing value. If a sales value is missing when
computing monthly product sales, use a formula involving the mean of the
existing monthly sales values for that product indexed by total sales for that
month across all products. Mark
such estimates to warn users.
4.
Track missing data by setting up a trigger (a routine that
will automatically execute when some event occurs or a time period passes). One trigger could log the missing entry
to a file and another trigger runs periodically to create a report of the
contents of this log file.
5.
Perform sensitivity testing so that missing data are
ignored unless results are significantly changed; if a student is already
getting an “A”, the result of the last quiz is immaterial.
INTEGRITY CONSTRAINTS
The major types of integrity constraints (whose purpose is to facilitate maintaining the accuracy and integrity of data) are: domain constraints, entity integrity, referential integrity, and operational constraints.
Domain Constraints
A domain is a set of values that may be assigned to an attribute. A domain definition usually consists of domain name, meaning, size (or length), and allowable values or range. All of the values that appear in a column or relation must be taken from the same domain.
Entity Integrity
Rule designed to assure that every relation has a primary key, and that the data values for that primary key are all valid and non-null. It is possible that a particular non-primary key attribute cannot be assigned a data value if there is no applicable value, or the applicable data value is not known. Example fax number (not applicable) or a forgotten telephone number. In such cases, a null value may be assigned. Note that a null value is not zero or a string of blanks. Inclusion of nulls could sometimes lead to anomalous result.
Referential Integrity
Associations between tables are defined through the use of foreign keys. For example, the association between the CUSTOMER and DRIVER tables is defined by including the Driver_ID attribute as a foreign key in CUSTOMR. So, before inserting a new row in the CUSTOMER table, the driver for that order must exist in the DRIVER table. A referential integrity constraint is a rule that maintains consistency among the rows of two relations. It states that if there is a foreign key in one relation, either each foreign key value must match a primary key value in the other relation or else the foreign key value must be null.
Operational Constraints
Example: “A person may purchase a ticket for the all-star game only if that person is a season-ticket holder.”
Managing data security, privacy, and integrity: Protecting the security, privacy, and integrity of the organizational databases rests with the database administration function.
USING AND SELECTING INDEXES
Database manipulations require locating a row (or collection of rows) that satisfy some conditions. Example, all customers in a given zip code, or all students with a particular major. Scanning every row in a table looking for the desired rows is extremely slow. Using indexes can speed this process, and defining indexes is an important part of physical database design. Indexes on a file can be created for either a primary key or a secondary key or both. The index is a table with two columns: the key and the address of the record(s) that contain that key value. For any primary key, there will be only one entry in the index for each key value.
Creating a Primary Key Index
The Driver table has a primary key of Driver_ID. A primary key index using SQL command is:
CREATE UNIQUE INDEX DRIVINDEX ON DRIVER(DRIVER_ID)
DRIVINEX is the name of the index file created to store the index entries. The ON clause specifies which table is being indexed and the column(s) that form the index key. The UNIQUE keyword makes the DRIVINDEX a primary key index. When this command is executed, any existing records in the driver table would be indexed. If there are duplicate values of Driver_ID, the CREATE INDEX command will fail. Once the index is created, the DBMS will reject any insertion or update of data in the DRIVER table that would violate the uniqueness constraint on DRIVER_Ids. For composite primary keys, list all the elements of the primary key in the ON clause.
Creating a Secondary Key Index
To retrieve rows of a relation based on values for attributes other than the primary key:
CREATE INDEX DESCINDX ON PRODUCT(DESCRIPTION) is a SQL command to create an index on the Description field of the Product table. The term UNIQUE should not be used with secondary key attributes, as values may be repeated.
When to Use Indexes
There is a trade-off between improved performance for retrievals, and degraded performance for inserting, deleting, and updating the records in a file.