Saturday, March 13, 2010
Entity Relationship Cascade
Friday, March 12, 2010
Entity Manager Merge() Vs Persist()
So the question is why do we have persist when merge can do both update and create entries?
Here it goes -
1) Performance and Memory
Persist will just dump the entity into the database but merge will have to first figure out if this entry exists in the database for it to make out if it is a create or a update scenario. Secondly merge will copy the passed object and will save the copied object into the database, so if the entity relationship is complex this copy procedure is more time consuming and a lil memory intensive.
2) RBAC
Some roles can only update few entries and may not be allowed to create new entries or viceversa, so to have a seggregation we have seperate apis for persist and merge.
Sunday, February 28, 2010
Flush, Persist and Merge
There are two flush modes -
1) Auto
2) Commit
Auto flush is the default which flushes [persists] the entities in the persistence context at the transaction commit time and also on every query executed with in a transaction. Exception to this is find() because if we want to find an entity which is modified then the find will return back the entity that is present in the persistence context. But if we execute a query then query doesnt return the whole entity but returns some fields as a list and this is the reason that thepersistence context is flushed during query execution.
Commit mode will flush the persistence context when the transaction commits.
We can force flush by calling the flush method on the entity manager to flush the persistence context.
Persist and Merge
Merge can persist but persist cannot merge.
When an entity is persisted and before even the transaction is not commited, we can get the primary key if the primary key is autogenerated using table strategy.
Persistence Context and Entity Manager
The life cycle of persistence context depends on whether it is transaction-scoped or extended persistence context.
Transaction-Scoped Persistence Context -
Transaction begins when the bean method is invoked and it ends when the method returns or completes. Similarly transaction-scoped persistence context follows the transaction and is created when the transaction begins and ends after the transaction commits or rolls back.
Extended Persistence Context -
This type of Persistence Context is independent of transaction i.e., the creation or destruction of persistence context is not dependent on transaction begin or transaction end as in the case of transaction-scoped persistence context. Persistence Context will be created when the statefull session bean is created and it is destroyed when the stateful session bean is destroyed.
Note:
There can be many instances of Entity Manager referring to the same single instance of persistence context if all these entity managers are part of the same transaction i.e., entity manager instances part of different ejb's invoked by an ejb which starts and ends the transaction. Please refer the picture below -
Tuesday, November 24, 2009
Optimistic Concurrency Control by enabling versioning in Hibernate
The following article is an excerpt from the great book - "Java Persistence with Hibernate"
Choosing an isolation level
Developers (ourselves included) are often unsure what transaction isolation level
to use in a production application. Too great a degree of isolation harms scalability
of a highly concurrent application. Insufficient isolation may cause subtle,
unreproduceable bugs in an application that you’ll never discover until the system
is working under heavy load.
Note that we refer to optimistic locking (with versioning) in the following explanation,
a concept explained later in this chapter. You may want to skip this section
and come back when it’s time to make the decision for an isolation level in your
application. Picking the correct isolation level is, after all, highly dependent on
your particular scenario. Read the following discussion as recommendations, not
carved in stone.
Hibernate tries hard to be as transparent as possible regarding transactional
semantics of the database. Nevertheless, caching and optimistic locking affect
these semantics. What is a sensible database isolation level to choose in a Hibernate
application?
First, eliminate the read uncommitted isolation level. It’s extremely dangerous to
use one transaction’s uncommitted changes in a different transaction. The rollback
or failure of one transaction will affect other concurrent transactions. Rollback
of the first transaction could bring other transactions down with it, or
perhaps even cause them to leave the database in an incorrect state. It’s even possible
that changes made by a transaction that ends up being rolled back could be
committed anyway, because they could be read and then propagated by another
transaction that is successful!
Secondly, most applications don’t need serializable isolation (phantom reads
aren’t usually problematic), and this isolation level tends to scale poorly. Few
existing applications use serializable isolation in production, but rather rely on
pessimistic locks (see next sections) that effectively force a serialized execution of
operations in certain situations.
This leaves you a choice between read committed and repeatable read. Let’s first
consider repeatable read. This isolation level eliminates the possibility that one
transaction can overwrite changes made by another concurrent transaction (the
second lost updates problem) if all data access is performed in a single atomic
database transaction. A read lock held by a transaction prevents any write lock a
concurrent transaction may wish to obtain. This is an important issue, but
enabling repeatable read isn’t the only way to resolve it.
Let’s assume you’re using versioned data, something that Hibernate can do for
you automatically. The combination of the (mandatory) persistence context
cache and versioning already gives you most of the nice features of repeatable
read isolation. In particular, versioning prevents the second lost updates problem,
and the persistence context cache also ensures that the state of the persistent
instances loaded by one transaction is isolated from changes made by other transactions.
So, read-committed isolation for all database transactions is acceptable if
you use versioned data.
Repeatable read provides more reproducibility for query result sets (only for
the duration of the database transaction); but because phantom reads are still
possible, that doesn’t appear to have much value. You can obtain a repeatable-
read guarantee explicitly in Hibernate for a particular transaction and piece
of data (with a pessimistic lock).
Setting the transaction isolation level allows you to choose a good default locking
strategy for all your database transactions. How do you set the isolation level?
Setting an isolation level
Every JDBC connection to a database is in the default isolation level of the DBMS—
usually read committed or repeatable read. You can change this default in the
DBMS configuration. You may also set the transaction isolation for JDBC connections
on the application side, with a Hibernate configuration option:
hibernate.connection.isolation = 4
Hibernate sets this isolation level on every JDBC connection obtained from a
connection pool before starting a transaction. The sensible values for this option
are as follows (you may also find them as constants in java.sql.Connection):
■ 1—Read uncommitted isolation
■ 2—Read committed isolation
■ 4—Repeatable read isolation
■ 8—Serializable isolation
Note that Hibernate never changes the isolation level of connections obtained
from an application server-provided database connection in a managed environment!
You can change the default isolation using the configuration of your application
server. (The same is true if you use a stand-alone JTA implementation.)
As you can see, setting the isolation level is a global option that affects all connections
and transactions. From time to time, it’s useful to specify a more restrictive
lock for a particular transaction. Hibernate and Java Persistence rely on
optimistic concurrency control, and both allow you to obtain additional locking
guarantees with version checking and pessimistic locking.
An optimistic approach always assumes that everything will be OK and that conflicting
data modifications are rare. Optimistic concurrency control raises an
error only at the end of a unit of work, when data is written. Multiuser applications
usually default to optimistic concurrency control and database connections
with a read-committed isolation level. Additional isolation guarantees are
obtained only when appropriate; for example, when a repeatable read is required.
This approach guarantees the best performance and scalability.
Understanding the optimistic strategy
To understand optimistic concurrency control, imagine that two transactions read
a particular object from the database, and both modify it. Thanks to the read-committed
isolation level of the database connection, neither transaction will run into any dirty reads. However, reads are still nonrepeatable, and updates may also be
lost. This is a problem you’ll face when you think about conversations, which are
atomic transactions from the point of view of your users. Look at figure 10.6.
Let’s assume that two users select the same piece of data at the same time. The
user in conversation A submits changes first, and the conversation ends with a successful
commit of the second transaction. Some time later (maybe only a second),
the user in conversation B submits changes. This second transaction also commits
successfully. The changes made in conversation A have been lost, and (potentially
worse) modifications of data committed in conversation B may have been based
on stale information.
You have three choices for how to deal with lost updates in these second transactions
in the conversations:
■ Last commit wins—Both transactions commit successfully, and the second
commit overwrites the changes of the first. No error message is shown.
■ First commit wins—The transaction of conversation A is committed, and the
user committing the transaction in conversation B gets an error message.
The user must restart the conversation by retrieving fresh data and go
through all steps of the conversation again with nonstale data.
■ Merge conflicting updates—The first modification is committed, and the transaction
in conversation B aborts with an error message when it’s committed.
The user of the failed conversation B may however apply changes selectively,
instead of going through all the work in the conversation again.
If you don’t enable optimistic concurrency control, and by default it isn’t enabled,
your application runs with a last commit wins strategy. In practice, this issue of lost
updates is frustrating for application users, because they may see all their work
lost without an error message.
Figure 10.6
Conversation B overwrites
changes made by conversation A.
Obviously, first commit wins is much more attractive. If the application user of
conversation B commits, he gets an error message that reads, Somebody already committed
modifications to the data you’re about to commit. You’ve been working with stale
data. Please restart the conversation with fresh data. It’s your responsibility to design
and write the application to produce this error message and to direct the user to
the beginning of the conversation. Hibernate and Java Persistence help you with
automatic optimistic locking, so that you get an exception whenever a transaction
tries to commit an object that has a conflicting updated state in the database.
Merge conflicting changes, is a variation of first commit wins. Instead of displaying
an error message that forces the user to go back all the way, you offer a dialog that
allows the user to merge conflicting changes manually. This is the best strategy
because no work is lost and application users are less frustrated by optimistic concurrency
failures. However, providing a dialog to merge changes is much more
time-consuming for you as a developer than showing an error message and forcing
the user to repeat all the work. We’ll leave it up to you whether you want to use
this strategy.
Optimistic concurrency control can be implemented many ways. Hibernate
works with automatic versioning.
Enabling versioning in Hibernate
Hibernate provides automatic versioning. Each entity instance has a version,
which can be a number or a timestamp. Hibernate increments an object’s version
when it’s modified, compares versions automatically, and throws an exception if a
conflict is detected. Consequently, you add this version property to all your persistent
entity classes to enable optimistic locking:
public class Item {
...
private int version;
...
}
You can also add a getter method; however, version numbers must not be modified
by the application. The <version> property mapping in XML must be placed
immediately after the identifier property mapping:
<class name="Item" table="ITEM">
<id .../>
<version name="version" access="field" column="OBJ_VERSION"/>
...
</class>
The version number is just a counter value—it doesn’t have any useful semantic
value. The additional column on the entity table is used by your Hibernate application.
Keep in mind that all other applications that access the same database can
(and probably should) also implement optimistic versioning and utilize the same
version column. Sometimes a timestamp is preferred (or exists):
public class Item {
...
private Date lastUpdated;
...
}
<class name="Item" table="ITEM">
<id .../>
<timestamp name="lastUpdated"
access="field"
column="LAST_UPDATED"/>
...
</class>
In theory, a timestamp is slightly less safe, because two concurrent transactions
may both load and update the same item in the same millisecond; in practice,
this won’t occur because a JVM usually doesn’t have millisecond accuracy (you
should check your JVM and operating system documentation for the guaranteed
precision).
Furthermore, retrieving the current time from the JVM isn’t necessarily safe in
a clustered environment, where nodes may not be time synchronized. You can
switch to retrieval of the current time from the database machine with the
source="db" attribute on the
support this (check the source of your configured dialect), and there is
always the overhead of hitting the database for every increment.
We recommend that new projects rely on versioning with version numbers, not
timestamps.
Optimistic locking with versioning is enabled as soon as you add a
or a <timestamp> property to a persistent class mapping. There is no other switch.
How does Hibernate use the version to detect a conflict?
Automatic management of versions
Every DML operation that involves the now versioned Item objects includes a version
check. For example, assume that in a unit of work you load an Item from the
database with version 1. You then modify one of its value-typed properties, such as
the price of the Item. When the persistence context is flushed, Hibernate detects
that modification and increments the version of the Item to 2. It then executes
the SQL UPDATE to make this modification permanent in the database:
update ITEM set INITIAL_PRICE='12.99', OBJ_VERSION=2
where ITEM_ID=123 and OBJ_VERSION=1
If another concurrent unit of work updated and committed the same row, the
OBJ_VERSION column no longer contains the value 1, and the row isn’t updated.
Hibernate checks the row count for this statement as returned by the JDBC
driver—which in this case is the number of rows updated, zero—and throws a
StaleObjectStateException. The state that was present when you loaded the
Item is no longer present in the database at flush-time; hence, you’re working
with stale data and have to notify the application user. You can catch this exception
and display an error message or a dialog that helps the user restart a conversation
with the application.
What modifications trigger the increment of an entity’s version? Hibernate
increments the version number (or the timestamp) whenever an entity instance is
dirty. This includes all dirty value-typed properties of the entity, no matter if
they’re single-valued, components, or collections. Think about the relationship
between User and BillingDetails, a one-to-many entity association: If a Credit-
Card is modified, the version of the related User isn’t incremented. If you add or
remove a CreditCard (or BankAccount) from the collection of billing details, the
version of the User is incremented.
If you want to disable automatic increment for a particular value-typed property
or collection, map it with the optimistic-lock="false" attribute. The
inverse attribute makes no difference here. Even the version of an owner of an
inverse collection is updated if an element is added or removed from the
inverse collection.
As you can see, Hibernate makes it incredibly easy to manage versions for optimistic
concurrency control. If you’re working with a legacy database schema or
existing Java classes, it may be impossible to introduce a version or timestamp
property and column. Hibernate has an alternative strategy for you.
Transaction Isolation Levels
A dirty read occurs if a one transaction reads changes made by another transaction
that has not yet been committed. This is dangerous, because the changes made by
the other transaction may later be rolled back, and invalid data may be written by
the first transaction, as shown in the figure
Non-repeatable Read
An unrepeatable read occurs if a transaction reads a row twice and reads different
state each time. For example, another transaction may have written to the row
and committed between the two reads as shown in the figure
Phantom Read
A phantom read is said to occur when a transaction executes a query twice, and
the second result set includes rows that weren’t visible in the first result set or rows
that have been deleted. (It need not necessarily be exactly the same query.) This
situation is caused by another transaction inserting or deleting rows between the
execution of the two queries as shown in the figure
Lost update
A lost update occurs if two transactions both update a row and then the second
transaction aborts, causing both changes to be lost. This occurs in systems that
don’t implement locking.
Transaction Isolation Levels
Read Uncommitted
A system that permits dirty reads but not lost updates is said to operate in
read uncommitted isolation. One transaction may not write to a row if another
uncommitted transaction has already written to it. Any transaction may read
any row, however.
Read Committed
A system that permits unrepeatable reads but not dirty reads is said to implement
read committed transaction isolation. This may be achieved by using
shared read locks and exclusive write locks. Reading transactions don’t
block other transactions from accessing a row. However, an uncommitted
writing transaction blocks all other transactions from accessing the row.
Repeatable Read
A system operating in repeatable read isolation mode permits neither unrepeatable
reads nor dirty reads. Phantom reads may occur. Reading transactions
block writing transactions (but not other reading transactions), and
writing transactions block all other transactions.
Serializable
Serializable provides the strictest transaction isolation. This isolation level
emulates serial transaction execution, as if transactions were executed one
after another, serially, rather than concurrently. Serializability may not be
implemented using only row-level locks. There must instead be some other
mechanism that prevents a newly inserted row from becoming visible to a
transaction that has already executed a query that would return the row.
Saturday, September 12, 2009
Saving (detached) entities
Saving an entity in JPA is simple, right? We just pass the object we want to persist to EntityManager.persist. It all seems to work quite well until we run into the dreaded "detached entity passed to persist" message. Or a similar message when we use a different JPA provider than the Hibernate EntityManager.
So what is that detached entity the message talks about? A detached entity (a.k.a. a detached object) is an object that has the same ID as an entity in the persistence store but that is no longer part of a persistence context (the scope of an EntityManager session). The two most common causes for this are:
- The EntityManager from which the object was retrieved has been closed.
- The object was received from outside of our application, e.g. as part of a form submission, a remoting protocol such as Hessian, or through a BlazeDS AMF Channel from a Flex client.
The contract for persist (see section 3.2.1 of the JPA 1.0 spec) explicitly states that an EntityExistsException is thrown by the persist method when the object passed in is a detached entity. Or any other PersistenceException when the persistence context is flushed or the transaction is committed. Note that it is not a problem to persist the same object twice within one transaction. The second invocation will just be ignored, although the persist operation might be cascaded to any associations of the entity that were added since the first invocation. Apart from that latter consideration there is no need to invoke EntityManager.persist on an already persisted entity because any changes will automatically be saved at flush or commit time.
saveOrUpdate vs. merge
Those of you that have worked with plain Hibernate will probably have grown quite accustomed to using the Session.saveOrUpdate method to save entities. The saveOrUpdate method figures out whether the object is new or has already been saved before. In the first case the entity is saved, in the latter case it is updated.
When switching from Hibernate to JPA a lot of people are dismayed to find that method missing. The closest alternative seems to be the EntityManager.merge method, but there is a big difference that has important implications. The Session.saveOrUpdate method, and its cousin Session.update, attach the passed entity to the persistence context while EntityManager.merge method copies the state of the passed object to the persistent entity with the same identifier and then return a reference to that persistent entity. The object passed is not attached to the persistence context.
That means that after invoking EntityManager.merge, we have to use the entity reference returned from that method in place of the original object passed in. This is unlike the the way one can simply invoke EntityManager.persist on an object (even multiple times as mentioned above!) to save it and continue to use the original object. Hibernate's Session.saveOrUpdate does share that nice behaviour with EntityManager.persist (or rather Session.save) even when updating, but it has one big drawback; if an entity with the same ID as the one we are trying to update, i.e. reattach, is already part of the persistence context, a NonUniqueObjectException is thrown. And figuring out what piece of code persisted (or merged or retrieved) that other entity is harder than figuring out why we get a "detached entity passed to persist" message.
Putting it all together
So let's examine the three possible cases and what the different methods do:
Scenario | EntityManager.persist | EntityManager.merge | SessionManager.saveOrUpdate |
---|---|---|---|
Object passed was never persisted | 1. Object added to persistence context as new entity 2. New entity inserted into database at flush/commit | 1. State copied to new entity. 2. New entity added to persistence context 3. New entity inserted into database at flush/commit 4. New entity returned | 1. Object added to persistence context as new entity 2. New entity inserted into database at flush/commit |
Object was previously persisted, but not loaded in this persistence context | 1. EntityExistsException thrown (or a PersistenceException at flush/commit) | 2. Existing entity loaded. 2. State copied from object to loaded entity 3. Loaded entity updated in database at flush/commit 4. Loaded entity returned | 1. Object added to persistence context 2. Loaded entity updated in database at flush/commit |
Object was previously persisted and already loaded in this persistence context | 1. EntityExistsException thrown (or a PersistenceException at flush or commit time) | 1. State from object copied to loaded entity 2. Loaded entity updated in database at flush/commit 3. Loaded entity returned | 1. NonUniqueObjectException thrown |
Looking at that table one may begin to understand why the saveOrUpdate method never became a part of the JPA specification and why the JSR members instead choose to go with the merge method. BTW, you can find a different angle on the saveOrUpdate vs. merge problem in Stevi Deter's blog about the subject.
The problem with merge
Before we continue, we need to discuss one disadvantage of the way EntityManager.merge works; it can easily break bidirectional associations. Consider the example with the Order and OrderLine classes from the previous blog in this series. If an updated OrderLine object is received from a web front end (or from a Hessian client, or a Flex application, etc.) the order field might be set to null. If that object is then merged with an already loaded entity, the order field of that entity is set to null. But it won't be removed from the orderLines set of the Order it used to refer to, thereby breaking the invariant that every element in an Order's orderLines set has its order field set to point back at that Order.
In this case, or other cases where the simplistic way EntityManager.merge copies the object state into the loaded entity causes problems, we can fall back to the DIY merge pattern. Instead of invoking EntityManager.merge we invoke EntityManager.find to find the existing entity and copy over the state ourselves. If EntityManager.find returns null we can decide whether to persist the received object or throw an exception. Applied to the Order class this pattern could be implemented like this:
Order existingOrder = dao.findById(receivedOrder.getId());
if(existingOrder == null) {
dao.persist(receivedOrder);
} else {
existingOrder.setCustomerName(receivedOrder.getCustomerName());
existingOrder.setDate(receivedOrder.getDate());
}
The pattern
So where does all this leave us? The rule of thumb I stick to is this:
- When and only when (and preferably where) we create a new entity, invoke EntityManager.persist to save it. This makes perfect sense when we view our domain access objects as collections. I call this the persist-on-new pattern.
- When updating an existing entity, we do not invoke any EntityManager method; the JPA provider will automatically update the database at flush or commit time.
- When we receive an updated version of an existing simple entity (an entity with no references to other entities) from outside of our application and want to save the new state, we invoke EntityManager.merge to copy that state into the persistence context. Because of the way merging works, we can also do this if we are unsure whether the object has been already persisted.
- When we need more control over the merging process, we use the DIY merge pattern.
I hope this blog gives you some pointers on how to save entities and how to work with detached entities. We'll get back to detached entities when we discuss Data Transfer Objects in a later blog. But next week we'll handle a number of common entity retrieval pattern first. In the meantime your feedback is welcome. What are your JPA patterns?
Bidirectional assocations
JPA offers the @OneToMany, @ManyToOne, @OneToOne, and @ManyToMany annotations to map associations between objects. While EJB 2.x offered container managed relationships to manage these associations, and especially to keep bidirectional associations in sync, JPA leaves more up to the developer.
The setup
Let's start by expanding the Order example from the previous blog with an OrderLine object. It has an id, a description, a price, and a reference to the order that contains it:
@Entity
public class OrderLine {
@Id
@GeneratedValue
private int id;
private String description;
private int price;
@ManyToOne
private Order order;
public int getId() { return id; }
public void setId(int id) { this.id = id; }
public String getDescription() { return description; }
public void setDescription(String description) { this.description = description; }
public int getPrice() { return price; }
public void setPrice(int price) { this.price = price; }
public Order getOrder() { return order; }
public void setOrder(Order order) { this.order = order; }
}
Using the generic DAO pattern, we quickly get ourselves a very basic OrderLineDao interface and an implementation:
public interface OrderLineDao extends Dao<Integer, OrderLine> {
public List<OrderLine> findOrderLinesByOrder(Order o);
}
public class JpaOrderLineDao extends JpaDao<Integer, OrderLine> implements
OrderLineDao {
public List<OrderLine> findOrderLinesByOrder(Order o) {
Query q = entityManager.createQuery("SELECT e FROM "
+ entityClass.getName() + " e WHERE order = :o ");
q.setParameter("o", o);
return (List<OrderLine>) q.getResultList();
}
}
We can use this DAO to add a orderline to an order, or to find all the order lines for an order:
OrderLine line = new OrderLine();
line.setDescription("Java Persistence with Hibernate");
line.setPrice(5999);
line.setOrder(o);
orderLineDao.persist(line);
Collection<OrderLine> lines = orderLineDao.findOrderLinesByOrder(o);
Mo associations, mo problems
All this is pretty straight forward, but it gets interesting when we make this association bidirectional. Let's add an orderLines field to our Order object and include a naïve implementation of the getter/setter pair:
@OneToMany(mappedBy = "order")
private Set<OrderLine> orderLines = new HashSet<OrderLine>();
public Set<OrderLine> getOrderLines() { return orderLines; }
public void setOrderLines(Set<
OrderLine> orderLines) { this.orderLines = orderLines; }
The mappedBy field on the @OneToMany annotation tells JPA that this is the reverse side of an association and, instead of mapping this field directly to a database column, it can look at order field of a OrderLine object to know with which Order object it goes.
So without changing the underlying database we can now retrieve the orderlines for an order like this:
Collection<
OrderLine> lines = o.getOrderLines();
No more need to access the OrderLineDao.
But there is a catch! While container managed relationships (CMR) as defined by EJB 2.x made sure that adding an OrderLine object to the orderLines property of an Order also sets the order property on that OrderLine (and vice versa), JPA (being a POJO framework) performs no such magic. This is actually a good thing because it makes our domain objects usable outside of a JPA container, which means you can test them more easily and use them when they have not been persisted (yet). But it can also be confusing for people that were used to EJB 2.x CMR behaviour.
If you run the examples above in separate transactions, you will find that they run correctly. But if you run them within one transaction like the code below does, you will find that the item list while be empty:
Order o = new Order();
o.setCustomerName("Mary Jackson");
o.setDate(new Date());
OrderLine line = new OrderLine();
line.setDescription("Java Persistence with Hibernate");
line.setPrice(5999);
line.setOrder(o);
System.out.println("Items ordered by " + o.getCustomerName() + ": ");
Collection<OrderLine> lines = o.getOrderLines();
for (OrderLine each : lines) {
System.out.println(each.getId() + ": " + each.getDescription()
+ " at $" + each.getPrice());
}
This can be fixed by adding the following line before the first System.out.println statement:
o.getOrderLines().add(line);
Fixing and fixing...
It works, but it's not very pretty. It breaks the abstraction and it's brittle as it depends on the user of our domain objects to correctly invoke these setters and adders. We can fix this by moving that invocation into the definition of OrderLine.setOrder(Order):
public void setOrder(Order order) {
this.order = order;
order.getOrderLines().add(this);
}
When can do even better by encapsulating the orderLines property of the Order object in a better manner:
public Set<OrderLine> getOrderLines() { return orderLines; }
public void addOrderLine(OrderLine line) { orderLines.add(line); }
And then we can redefine OrderLine.setOrder(Order) as follows:
public void setOrder(Order order) {
this.order = order;
order.addOrderLine(this);
}
Still with me? I hope so, but if you're not, please try it out and see for yourself.
Now another problem pops up. What if someone directly invokes the Order.addOrderLine(OrderLine) method? The OrderLine will be added to the orderLines collection, but its order property will not point to the order it belongs. Modifying Order.addOrderLine(OrderLine) like below will not work because it will cause an infinite loop with addOrderLine invoking setOrder invoking addOrderLine invoking setOrder etc.:
public void addOrderLine(OrderLine line) {
orderLines.add(line);
line.setOrder(this);
}
This problem can be solved by introducing an Order.internalAddOrderLine(OrderLine) method that only adds the line to the collection, but does not invoke line.setOrder(this). This method will then be invoked from OrderLine.setOrder(Order) and not cause an infinite loop. Users of the Order class should invoke Order.addOrderLine(OrderLine).
The pattern
Taking this idea to its logical conclusion we end up with these methods for the OrderLine class:
public Order getOrder() { return order; }
public void setOrder(Order order) {
if (this.order != null) { this.order.internalRemoveOrderLine(this); }
this.order = order;
if (order != null) { order.internalAddOrderLine(this); }
}
And these methods for the Order class:
public Set<OrderLine> getOrderLines() { return Collections.unmodifiableSet(orderLines); }
public void addOrderLine(OrderLine line) { line.setOrder(this); }
public void removeOrderLine(OrderLine line) { line.setOrder(null); }
public void internalAddOrderLine(OrderLine line) { orderLines.add(line); }
public void internalRemoveOrderLine(OrderLine line) { orderLines.remove(line); }
These methods provide a POJO-based implementation of the CMR logic that was built into EJB 2.x. With the typical POJOish advantages of being easier to understand, test, and maintain.
Of course there are a number of variations on this theme:
- If Order and OrderLine are in the same package, you can give the internal... methods package scope to prevent them from being invoked by accident. (This is where C++'s friend class concept would come in handy. Then again, let's not go there.
).
- You can do away with the removeOrderLine and internalRemoveOrderLine methods if order lines will never be removed from an order.
- You can move the responsibility for managing the bidirectional association from the OrderLine.setOrder(Order) method to the Order class, basically flipping the idea around. But that would mean spreading the logic over the addOrderLine and removeOrderLine methods.
- Instead of, or in addition to, using Collections.singletonSet to make the orderLine set read-only at run-time, you can also use generic types to make it read-only at compile-time:
public Set<
? extends OrderLine> getOrderLines() { return Collections.unmodifiableSet(orderLines); }But this makes it harder to mock these objects with a mocking framework such as EasyMock.
There are also some things to consider when using this pattern:
- Adding an OrderLine to an Order does not automatically persist it. You'll need to also invoke the persist method on its DAO (or the EntityManager) to do that. Or you can set the cascade property of the @OneToMany annotation on the Order.orderLines property to CascadeType.PERSIST (at least) to achieve that. More on this when we discuss the EntityManager.persist method.
- Bidirectional associations do not play well with the EntityManager.merge method. We will discuss this when we get to the subject of detached objects.
- When an entity that is part of a bidirectional associated is (about to be) removed, it should also be removed from the other end of the association. This will also come up when we talk about the EntityManager.remove method.
- The pattern above only works when using field access (instead of property/method access) to let your JPA provider populate your entities. Field access is used when the @Id annotation of your entity is placed on the corresponding field as opposed of the corresponding getter. Whether to prefer field access or property/method access is a contentious issue to which I will return in a later blog.
- And last but not least; while this pattern may be a technically sound POJO-based implementation of managed associations, you can argue why you need all those getters and setters. Why would you need to be able to use both Order.addOrderLine(OrderLine) and OrderLine.setOrder(Order) to achieve the same result? Doing away with one of these could make our code simpler. See for example James Holub's article on getters and setters. Then again, we've found that this pattern gives developers that use these domain objects the flexibility to associate them as they wish.
Sunday, August 30, 2009
JPA Performance: Dont Ignore the Database
Database Schema
Good Database schema design is important for performance. One of the most basic optimizations is to design your tables to take as little space on the disk as possible , this makes disk reads faster and uses less memory for query processing.Data Types
You should use the smallest data types possible, especially for indexed fields. The smaller your data types, the more indexes (and data) can fit into a block of memory, the faster your queries will be.Normalization
Database Normalization eliminates redundant data, which usually makes updates faster since there is less data to change. However a Normalized schema causes joins for queries, which makes queries slower, denormalization speeds retrieval. More normalized schemas are better for applications involving many transactions, less normalized are better for reporting types of applications. You should normalize your schema first, then de-normalize later. Applications often need to mix the approaches, for example use a partially normalized schema, and duplicate, or cache, selected columns from one table in another table. With JPA O/R mapping you can use the @Embedded annotation for denormalized columns to specify a persistent field whose @Embeddable type can be stored as an intrinsic part of the owning entity and share the identity of the entity.
Database Normalization and Mapping Inheritance Hiearchies
The Class Inheritance hierarchy shown below will be used as an example of JPA O/R mapping.
In the Single table per class mapping shown below, all classes in the hierarchy are mapped to a single table in the database. This table has a discriminator column (mapped by @DiscriminatorColumn), which identifies the subclass. Advantages: This is fast for querying, no joins are required. Disadvantages: wastage of space since all inherited fields are in every row, a deep inheritance hierarchy will result in wide tables with many, some empty columns.

In the Joined Subclass mapping shown below, the root of the class hierarchy is represented by a single table, and each subclass has a separate table that only contains those fields specific to that subclass. This is normalized (eliminates redundant data) which is better for storage and updates. However queries cause joins which makes queries slower especially for deep hierachies, polymorphic queries and relationships.

In the Table per Class mapping (in JPA 2.0, optional in JPA 1.0), every concrete class is mapped to a table in the database and all the inherited state is repeated in that table. This is not normlalized, inherited data is repeated which wastes space. Queries for Entities of the same type are fast, however polymorphic queries cause unions which are slower.

Know what SQL is executed
You need to understand the SQL queries your application makes and evaluate their performance. Its a good idea to enable SQL logging, then go through a use case scenario to check the executed SQL. Logging is not part of the JPA specification, With EclipseLink you can enable logging of SQL by setting the following property in the persistence.xml file:<property name="eclipselink.logging.level" value="FINE"/>
</properties>
With Hibernate you set the following property in the persistence.xml file:
<properties>
<property name="hibernate.show_sql" value="true" />
</properties>
Basically you want to make your queries access less data, is your application retrieving more data than it needs, are queries accessing too many rows or columns? Is the database query analyzing more rows than it needs? Watch out for the following:
- queries which execute too often to retrieve needed data
- retrieving more data than needed
- queries which are too slow
- you can use EXPLAIN to see where you should add indexes
With MySQL you can use the slow query log to see which queries are executing slowly, or you can use the MySQL query analyzer to see slow queries, query execution counts, and results of EXPLAIN statements.
Understanding EXPLAIN
For slow queries, you can precede a SELECT statement with the keyword EXPLAIN to get information about the query execution plan, which explains how it would process the SELECT, including information about how tables are joined and in which order. This helps find missing indexes early in the development process.
You should index columns that are frequently used in Query WHERE, GROUP BY clauses, and columns frequently used in joins, but be aware that indexes can slow down inserts and updates.
Lazy Loading and JPA
With JPA many-to-one and many-to-many relationships lazy load by default, meaning they will be loaded when the entity in the relationship is accessed. Lazy loading is usually good, but if you need to access all of the "many" objects in a relationship, it will cause n+1 selects where n is the number of "many" objects.
You can change the relationship to be loaded eagerly as follows :

However you should be careful with eager loading which could cause SELECT statements that fetch too much data. It can cause a Cartesian product if you eagerly load entities with several related collections.
If you want to override the LAZY fetch type for specific use cases, you can use Fetch Join. For example this query would eagerly load the employee addresses:

In General you should lazily load relationships, test your use case scenarios, check the SQL log, and use @NameQueries with JOIN FETCH to eagerly load when needed.
Partitioning
the main goal of partitioning is to reduce the amount of data read for particular SQL operations so that the overall response time is reducedVertical Partitioning splits tables with many columns into multiple tables with fewer columns, so that only certain columns are included in a particular dataset, with each partition including all rows.
Horizontal Partitioning segments table rows so that distinct groups of physical row-based datasets are formed. All columns defined to a table are found in each set of partitions. An example of horizontal partitioning might be a table that contains historical data being partitioned by date.

Vertical Partitioning
In the example of vertical partitioning below a table that contains a number of very wide text or BLOB columns that aren't referenced often is split into two tables with the most referenced columns in one table and the seldom-referenced text or BLOB columns in another.
By removing the large data columns from the table, you get a faster query response time for the more frequently accessed Customer data. Wide tables can slow down queries, so you should always ensure that all columns defined to a table are actually needed.

The example below shows the JPA mapping for the tables above. The Customer data table with the more frequently accessed and smaller data types is mapped to the Customer Entity, the CustomerInfo table with the less frequently accessed and larger data types is mapped to the CustomerInfo Entity with a lazily loaded one to one relationship to the Customer.
Horizontal Partitioning
The major forms of horizontal partitioning are by Range, Hash, Hash Key, List, and Composite.Horizontal partitioning can make queries faster because the query optimizer knows what partitions contain the data that will satisfy a particular query and will access only those necessary partitions during query execution. Horizontal Partitioning works best for large database Applications that contain a lot of query activity that targets specific ranges of database tables.

Hibernate Shards
Partitioning data horizontally into "Shards" is used by google, linkedin, and others to give extreme scalability for very large amounts of data. eBay "shards" data horizontally along its primary access path.Hibernate Shards is a framework that is designed to encapsulate support for horizontal partitioning into the Hibernate Core.

Caching
JPA Level 2 caching avoids database access for already loaded entities, this make reading reading frequently accessed unmodified entities faster, however it can give bad scalability for frequent or concurrently updated entities.You should configure L2 caching for entities that are:
- read often
- modified infrequently
- Not critical if stale
References and More Information:
JPA Best Practices presentationMySQL for Developers Article
MySQL for developers presentation
MySQL for developers screencast
Keeping a Relational Perspective for Optimizing Java Persistence
Java Persistence with Hibernate
Pro EJB 3: Java Persistence API
Java Persistence API 2.0: What's New ?
High Performance MySQL book
Pro MySQL, Chapter 6: Benchmarking and Profiling
EJB 3 in Action
sharding the hibernate way
JPA Caching
Best Practices for Large-Scale Web Sites: Lessons from eBay
Original Source
Wednesday, July 15, 2009
JPA implementation patterns: Data Access Objects
The JPA, short for Java Persistence API, is part of the Java EE 5 specification and has been implemented by Hibernate, TopLink, EclipseLink, OpenJPA, and a number of other object-relational mapping (ORM) frameworks. Because JPA was originally designed as part of the EJB 3.0 specification, you can use it within an EJB 3.0 application. But it works equally well outside of EJB 3.0, for example in a Spring application. And when even Gavin King, the designer of Hibernate, recommends using JPA in the second edition of Hibernate in Action, a.k.a. Java Persistence with Hibernate, it's obvious that JPA is here to stay.
Once you get over your fear of annotations ;-), you find that there is plenty of literature out there that explains the objects and methods within the API, the way these objects work together and how you can expect them to be implemented. And when you stick to hello-world-style programs, it all seems pretty straight forward. But when you start writing your first real application, you find that things are not so simple. The abstraction provided by JPA is pretty leaky and has ramifications for larger parts of your application than just your Data Access Objects (DAO's) and your domain objects. You need to make decisions on how to handle transactions, lazy loading, detached object (think web frameworks), inheritance, and more. And it turns out that the books and the articles don't really help you here.
At least, that is what I discovered when I really started using JPA for the first time. In the coming weeks, I would like to discuss the choices I came up against and the decisions I made and why I made them. When I'm done, we'll have a number of what I would like to not-too-modestly call JPA implementation patterns.
Do we really need a DAO?
So, let's start with the thing you would probably write first in your JPA application: the data access object (DAO). An interesting point to tackle before we even start is whether you even need a DAO when using JPA. The conclusion of that discussion more than a year ago was "It depends" and while it is very hard to argue with such a conclusion :-), I would like to stick with the idea that a DAO does have its place in a JPA application. Arguably it provides only a thin layer on top of JPA, but more importantly making a DAO per entity type gives you these advantages:
- Instead of having to pick the right EntityManager method every time you want to store or load data, you decide which one to use once and you and your whole team can easily stick to that choice.
- You can disallow certain operations for certain entity types. For example, you might never want your code to remove log entries. When using DAO's, you just do not add a remove method to your LogEntry DAO.
- Theoretically, by using DAO's you could switch to another persistence system (like plain JDBC or iBATIS). But because JPA is such a leaky abstraction I think that that is not realistically possible for even a slightly complex application. You do get a single point of entry where you can add tracing features or keep performance statistics.
- You can centralize all the queries on a certain entity type instead of scattering them through your code. You could use named queries to keep the queries with the entity type, but you'd still need some central place where the right parameters are set. Putting both the query, the code that sets the parameters, and the cast to the correct return type in the DAO seems a simpler thing to do. For example:
public List
findExecutingChangePlans() {
Query query = entityManager.createQuery(
"SELECT plan FROM ChangePlan plan where plan.state = 'EXECUTING'");
return (List) query.getResultList();
}
So when you decide you are going to use DAO's, how do you go about writing them? The highlighted (in bold) comment in the Javadoc for Spring's JpaTemplate seems to suggest that there's not much point in using that particular class, which also makes JpaDaoSupport superfluous. Instead you can write your JPA DAO as a POJO using the @PersistenceContext annotation to get an EntityManager reference. It will work in an EJB 3.0 container and it will work in Spring 2.0 and up if you add the PersistenceAnnotationBeanPostProcessor bean to your Spring context.
The type-safe generic DAO pattern
Because each DAO shares a lot of functionality with the other DAO's, it makes sense to have a base class with the shared functionality and then subclass from that for each specific DAO. There are a lot of blogs out there about such a type-safe generic DAO pattern and you can even download some code from Google Code. When we combine elements from all these sources, we get the following JPA implementation pattern for DAO's.
The entity class
Let's say we want to persist the following Order class:
@Entity
@Table(name = "ORDERS")
public class Order {
@Id
@GeneratedValue
private int id;
private String customerName;
private Date date;
public int getId() { return id; }
public void setId(int id) { this.id = id; }
public String getCustomerName() { return customerName; }
public void setCustomerName(String customerName) { this.customerName = customerName; }
public Date getDate() { return date; }
public void setDate(Date date) { this.date = date;}
}
Don't worry too much about the details of this class. We will revisit the specifics in other JPA implementation patterns. The @Table annotation is there because ORDER is a reserved keyword in SQL.
The DAO interfaces
First we define a generic DAO interface with the methods we'd like all DAO's to share:
public interface Dao{
void persist(E entity);
void remove(E entity);
E findById(K id);
}
The first type parameter, K, is the type to use as the key and the second type parameter, E, is the type of the entity. Next to the basic persist, remove, and findById methods, you might also like to add a List
Then we define one subinterface for each entity type we want to persist, adding any entity specific methods we want. For example, if we'd like to be able to query all orders that have been added since a certain date, we can add such a method:
public interface OrderDao extends Dao{
ListfindOrdersSubmittedSince(Date date);
}
The base DAO implementation
The third step is to create a base JPA DAO implementation. It will have basic implementation of all the methods in the standard Dao interface we created in step 1:
public abstract class JpaDaoimplements Dao {
protected ClassentityClass;
@PersistenceContext
protected EntityManager entityManager;
public JpaDao() {
ParameterizedType genericSuperclass = (ParameterizedType) getClass().getGenericSuperclass();
this.entityClass = (Class) genericSuperclass.getActualTypeArguments()[1];
}
public void persist(E entity) { entityManager.persist(entity); }
public void remove(E entity) { entityManager.remove(entity); }
public E findById(K id) { return entityManager.find(entityClass, id); }
}
Most of the implementation is pretty straight forward. Some points to note though:
- The constructor of the JpaDao includes the method proposed by my colleague Arjan Blokzijl to use reflection to get the entity class.
- The @PersistenceContext annotation causes the EJB 3.0 container or Spring to inject the entity manager.
- The entityManager and entityClass fields are protected so that subclasses, i.e. specific DAO implementations, can access them.
The specific DAO implementation
And finally we create such a specific DAO implementation. It extends the basic JPA DAO class and implements the specific DAO interface:
public class JpaOrderDao extends JpaDaoimplements OrderDao {
public ListfindOrdersSubmittedSince(Date date) {
Query q = entityManager.createQuery(
"SELECT e FROM " + entityClass.getName() + " e WHERE date >= :date_since");
q.setParameter("date_since", date);
return (List) q.getResultList();
}
}
Using the DAO
How you get a reference to an instance of your OrderDao depends upon whether we use EJB 3.0 or Spring. In EJB 3.0 we'd use a annotation like this:
@EJB(name="orderDao")
private OrderDao orderDao;
while in Spring we can use the XML bean files or we can use autowiring like this:
@Autowired
public OrderDao orderDao;
In any case, once we have a reference to the DAO we can use it like this:
Order o = new Order();
o.setCustomerName("Peter Johnson");
o.setDate(new Date());
orderDao.persist(o);
But we can also use the entity specific query we added to the OrderDao interface:
Listorders = orderDao.findOrdersSubmittedSince(date);
for (Order each : orders) {
System.out.println("order id = " + each.getId());
}
With this type-safe DAO pattern we get the following advantages:
- No direct dependency on the JPA api from client code.
- Type-safety through the use of generics. Any casts that still need to be done are handled in the DAO implementation.
- One logical place to group all entity-specific JPA code.
- One location to add transaction markers, debugging, profiling, etc. Although as we will see later, we will need to add transaction markers in other parts of our applications too.
- One class to test when testing the database access code. We will revisit this subject in a later JPA implementation pattern.
I hope this convinces you that you do need DAO's with JPA.
And that wraps up the first JPA implementation pattern. In the next blog of this series we will build on this example to discuss the next pattern. In the meantime I would love to hear from you how you write your DAO's!