Data Modeling Best Practices¶
A practical guide for designing effective ERDs. Common mistakes and principles to avoid them.
Naming Conventions¶
Entity Naming¶
| Rule | Good Example | Bad Example | Reason |
|---|---|---|---|
| Use singular form | Customer, Order | Customers, OrderList | Entities represent an instance, not a collection |
| Use business terms | Accounts Receivable | AR_TABLE | Logical models use business language |
| Avoid abbreviations | Order Detail | Ord DTL | Don't mix abbreviations in logical names |
| Avoid verbs | Shipment | Ship Items | Entities are nouns |
Column Naming¶
| Rule | Good Example | Bad Example | Reason |
|---|---|---|---|
| Entity prefix (for PK) | Customer No | Number | Makes it clear which number |
| Consistent suffixes | Order Date, Reg Date | Order Day, Reg DT | Same concept = same word |
| Use standard words | Amount (AMT) | Money, Cash | Use words registered in the standard dictionary |
| Clear boolean names | Delete Flag | Flag1 | The meaning of the value should be obvious |
Physical Name Rules¶
| Rule | Example |
|---|---|
| UPPERCASE + UNDERSCORE | CUST_NO, ORD_DT |
| 30 characters or less (Oracle compatibility) | CUST_ORD_DTL_SEQ |
| Avoid reserved words | ORD (instead of ORDER), GRP (instead of GROUP) |
| Use standard abbreviations | NM(Name), NO(Number), DT(Date), AMT(Amount) |
→ Standard abbreviation management: Words, Terms & Domains
Entity Design Principles¶
1. One Entity = One Concept¶
If an entity contains two or more concepts, consider splitting it.
→ Customer and order information are mixed. If the same customer orders multiple times, customer info is duplicated.
2. Define PK for Every Entity¶
Entities without a PK cannot distinguish individual rows, making data integrity impossible to guarantee.
| Entity Type | PK Strategy | Example |
|---|---|---|
| Master (Customer, Product) | Business key or surrogate key | CUST_NO, PROD_CD |
| Transaction (Order, Payment) | Numbering rule-based | ORD_NO (YYYYMMDD + SEQ) |
| Bridge (Enrollment, Assignment) | Composite PK | (STUDENT_NO, COURSE_CD) |
| History (Change Log) | Composite PK | (SOURCE_PK, CHANGE_DTIM) |
3. Normalization Decisions¶
| Level | Key Question | Problem When Violated |
|---|---|---|
| 1NF | Are there repeating groups in columns? | Phone1, Phone2, Phone3... |
| 2NF | Are columns dependent on only part of a composite PK? | Partial dependency → data duplication |
| 3NF | Are columns dependent on non-PK columns? | Transitive dependency → update anomalies |
Practical Normalization Principles
- In the logical modeling phase, normalize up to 3NF
- At physical design time, only intentionally denormalize where performance requires it
- When denormalizing, record the reason in a Note
Relationship Design Principles¶
1:N vs M:N Decision¶
| Question | 1:N | M:N |
|---|---|---|
| "Can one A have multiple B?" | ✅ | — |
| "Can one B also have multiple A?" | — | ✅ |
When you discover an M:N relationship, create a bridge entity to decompose it into two 1:N relationships.
Common Mistakes¶
| Mistake | Problem | Solution |
|---|---|---|
| Missing FK | Relationship exists but no FK | Create relationship in ThinkERD → FK auto-generated |
| Circular reference | A→B→C→A cycle | Review business rules. Confirm if the cycle is truly needed |
| Excessive identifying relationships | All relationships made identifying | Use non-identifying if the child has independent meaning |
| FK overuse | FK set for all code tables | For code-type references, use logical relationships only without FK |
Managing Large-Scale ERDs¶
When You Have 50+ Entities¶
| Strategy | Method |
|---|---|
| Separate with Zones | Create zones per domain for visual grouping |
| Color Coding | Unify entity colors per domain |
| Use View Modes | Logical for business review, physical for dev handoff |
| Use Notes | Record design decisions in notes |
Review Checklist¶
Before reviewing an ERD, verify the following:
- Is a PK defined for every entity?
- Are entity names singular business terms?
- Are cardinality and optionality correct?
- Are there unnecessary column duplications? (Check normalization)
- Are identifying/non-identifying relationships appropriate?
- Are standard words/domains applied?
- Is code-type data separated into its own entities?