Dec 3, 2013

Plain Java EE Architectures: Rethinking Best Practices?

I recently attended a session at the W-JAX13 in Munich on plain JEE architectures. It was an interesting and amusing talk. But, the speaker made a few comments on how you should be developing Java enterprise software that got me thinking and I definitely disagreed with. So I went home and borrowed an E-book on Amazon about Java EE patterns. Again, this book got me thinking. Here are my thoughts.

One big WAR only. Seriously?

In the session the speaker's point was that nowadays you should stick everything into one big WAR file. In JEE6 there is no need for dedicated EJB-JARs or even EARs anymore (as long as you don't need sophisticed classpaths or JCA connectors. And who does?).

Even though this is technically true, this doesn't mean you should do it only because you can. For trivial applications this might be perfectly reasonable. But for a professional business application with several 10ks of  LOC, this would just be insane.

The speaker also argued for more business-oriented components rather than technical-oriented. I totally agree. You achieve this by using dedicated Java packages for your business concerns (e.g. customer management, price calculation, shipping workflow, and so on). In my opinion the next logical step would be to encapsulate these components in individual artifacts: JARs. Tools like Maven and Gradle provide an easy means to do so, using multi module projects. In a complex system this allows you to control the allowed dependencies between your business components. If you had one big WAR only, it would just be too easy to produce a highly tangled design. Effectively a Big Ball of Mud. So even in the dawn of JEE, I still think it's a good idea to have individual deployment units for your business components.

No Interfaces. Really?

Yes, in JEE 6 and 7 you don't have to write a lot of boilerplate XML or Java code anymore. Great. For most cases you have suitable annotations available. Development has become easier and the code is more compact.

Technically, a business interface for your EJB3.x components is not required anymore; you automagically get a view of all public methods of an Enterprise Java Bean. But, even though this is possible, that doesn't mean it's always a good idea to do so.

Again, for simple components or applications this approach might work well. I don't say that every class needs an interface, but in my experience the real world is just not as simple. Here are some important reasons to use interfaces:

  • Contract first. Before you start implementing the business logic of a new component it is always and still good practise to think about its interface first. The interface states the contract between the caller and the callee. What methods are supported? Which parameters need to be passed? What is the outcome? Are there any pre- or postconditions and invariants?
    If you are working with co-located teams, once you have your interface everyone can start coding. In feature-oriented, agile teams the UI and backend developers can implement their specific part of an user story almost independently. All that is required is a well designed interface. 
  • Alternative implementations. An interface usually is technology neutral. You could decide to implement it using EJB, CDI or even Spring. OK, in practice this is esotheric, since the overall architecture of your system will probably only ever use one of these technologies and never change.
    But the chance that you will have alternative implementations of an interface is still pretty high. For local development you might have an in-memory version of the component, for full-blown deployment a database implementation and for testing a simple stub implementation. By using feature flags (such as Togglz) you can easily swap the different implementations.
  • Emergent design. Interfaces are also an excellent enabler for emergent design. You might not know yet if a component needs to support persistence or if an in-memory implementation is sufficient. Or does it need to be async in the future? The usual unknown unknowns.
    By using interfaces your components can evolve anyway the business requirements needs them to. Your calling code does not need to change a bit. Awesome.
  • Easy Testing. The speaker mentioned that this point would not be valid anymore. A JEE component is a simple POJO and testing is easy enough without having interfaces. I beg to disagree. Try mocking a final method of a collaborating class and you see my point. It just doesn't work as you might expect (at least not with the mocking frameworks I know). So if you don't want to have a hard time testing your code: use interfaces.

DAOs are dead. Are they?

An interesting section in the book is about obsolete J2EE/JEE patterns. Apparently, Data Access Objects aka DAOs are dead. For most cases. Mainly because of the following two reasons:

  • Overuse: the idea of decoupling from the database is neither realistic nor necessary in most cases.
  • Abstraction: a complete abstraction of the underlying data store is only possible for simple use cases.

Apparently, the usage of a JPA EntityManager is all the abstraction you need in a JEE environment. In my opinion the DAO pattern is still very useful and vital, even for JEE.

I agree, the ability to swap out different persistence technologies transparently by using a DAO interface is mostly esotheric. Regarding the leaky abstractions: since you can't avoid them, you have to live with them anyway. But all this doesn't render DAOs useless. The main purpose and value of a DAO lies in one fundamental concept: separation of concerns.
In a well designed system your components usually fall into one of the following categories:

  • T-components are independent of the actual business domain model, and deal with some technology or API such as JDBC or low level async I/O.
  • A-components are independent of any technology, they are determinded by the business domain and are responsible for the actual application logic. 
  • 0-components are basic and usually reusable building blocks, comparable to the classes found in the java.lang or java.util packages. Libraries like Commons Lang or Commons Collections also fall into this category.

A data access object is a typical T-component. No business code, just pure technical interaction with some persistence API or framework. Do you really want the code required to call JPA named queries or the Criteria API pollute your A-components? I don't think so.

Instead you encapsulate the required code into a nice data access object. Call it differently if you don't like the name. Usually, you don't have to write this sort of code manually anyway. For example, you could use or write an annotation based lightweight code generator to do this for you. Testing your data access code and named queries also becomes a lot easier. Write some integration tests to verify your DAOs, and test your business code using simple mock implementations of your DAOs.

Unit tests without value. Pardon?

All patterns in the book have a dedicated section on Testing. In an agile, test driven development world this section is almost mandatory. But wait: for several patterns you read something like "... for this pattern a unit test does not provide any value ...". The reason being the simplicity of the implementation.

Again, I disagree. In my experience, you can not be thorough enough! Nothing is more annoying than getting a NPE in really simple methods for some corner cases. If you are developing test driven then you should be writing the test alongside the actual implementation without any real additional effort. The foundation of your testing strategy should be loads of unit tests (Test Pyramid). Relying on manual system or integration tests to cover all possible cases is just plain careless.

Rethinking? Yes, but in moderation.

Even though I might disagree in a few points with the speaker and author I still think reading the book was worthwhile. A novice developer gets a good introduction into JEE and CDI, and a load of usable design patterns that show how to apply these technologies to real world problems.

For the more experienced Java developers, the Rethinking sections usually contain a good critical appraisal of the well known and established J2EE patterns. Like me, you might not agree with all the points, but at least they got you thinking about your day to day work once more. And that is valuable on it's own.

But don't throw your best practices in JEE architecture and development over board just yet. Moderation.

Oct 29, 2013

Log Collection and Analysis with logstash (and Redis, ElasticSearch & Kibana)

In this post I want to show you how to setup a decent and complete infrastructure for centralized log management based on logstash - demonstrated on Apache Tomcat logs on Windows. This post is adapted from a Strata Conference 2013 tutorial by Israel Ekpo and the official logstash getting started guide.

Logstash is a lightweight and simple-to-use toolchain to collect, transform and analyze log data. To do so logstash requires a buffer to collect log events  (Redis) and a fulltext search engine as the final destination for the collected events (ElasticSearch). Assembled together this looks like this:

logstash architecture & data flow

A shipper detects new log events on his node and dispatches them to a central broker. The indexer polls the broker for new events and pushes them to the storage&search facility. The web interface is then able to query the storage & search facility on many ways. So - how to setup and wire all this stuff? These are the four necessary steps:

1) Basic setup

Sample application to produce logs
If you have no own Tomcat webapp to produce sample logs you can use this one: https://github.com/israelekpo/graph-service. Just checkout the code, re-configure the path to store the data, build it with maven and deploy the WAR to a Tomcat installation (the path to the installation is referred to as <catalina-home>). Then startup Tomcat and perform some requests. Further details are provided on the mentioned GitHub page. If you don't want to use cURL then you can also help yourself with Chrome Advanced REST Client.

logstash
Download the jar (logstash-<version>-flatjar.jar) from here and place it in any folder (referred to as <logstash-home>).

ElasticSearch
Download the archive from here and uncompress it into a folder (referred to as <elasticsearch-home>).

Redis
Download the archive from here and uncompress it into a folder (referred to as <redis-home>).

It's that easy.

2) The shipper
Now we've to to setup the agent which collects log data at a node and puts it into redis. All you need to do so is to write a configuration file (e.g. <somewhere>/logstash-agent.conf, referred to as <agent-config-filepath>) with the following content:

input {
 file {
  type => "CATALINA"
  path => "<catalina-home>/logs/catalina.log"
 }
 file {
  type => "ACCESS"
  path => "<catalina-home>/logs/localhost_access_log*.txt"
 } 
}

output {
  stdout { codec => json }
  redis { host => "127.0.0.1" data_type => "list" key => "logstash" }
}

This configuration file makes logstash to collect log entries from the Tomcat engine log and all Tomcat access logs (matched by a glob). The default for logstash is to perform a tail on each log file to process only new entries. If you want to initally read the whole file (and then perform a tail) you've to add start_position => "beginning" into the file block.

How to test it:

  1. Startup Redis: <redis-home>\redis-server --loglevel verbose
  2. Startup a logstash agent: java -jar <logstash-home>\logstash-<version>-flatjar.jar agent -f "<agent-config-filepath>"
  3. Make the app to produce logs.

On the Redis console you then should see the logstash agent connect:

To be really sure you can open the Redis command line client (on Windows: <redis-home>\redis-cli.exe) and check if there is appropriate data inside of Redis:


3) The indexer
The next step is to setup the indexer which transfers the log data from Redis into ElasticSearch. Same as above the only thing you've to to is to write a configuration file  (e.g. <somewhere>/logstash-indexer.conf, referred to as <indexer-config-filepath>) with the following content:


input {
  redis {
    host => "127.0.0.1"
    data_type => "list"
    key => "logstash"
    codec => json
  }
}

output {
  stdout { debug => true debug_format => "json"}
  elasticsearch {
    host => "127.0.0.1"
  }
}

How to test it:

  1. Startup ElasticSearch: <elasticsearch-home>\bin\elasticsearch.bat
  2. Fire up a logstash agent with the right configuration: java -jar <logstash-home>\logstash-<version>-flatjar.jar agent -f "<indexer-config-filepath>"

Now the whole toolchain should perform. To test if the log data really reaches ElasticSearch you can use its REST-API or better: Let the Sense Chrome extension help you. Just install & open it and run the default query. You should see your logs as indexed documents:

Sense output


4) The web interface
And now it's getting really professional - we also start a web user interface to analyze and visualize the collected log entries. It's that easy:

java -jar <logstash-home>\logstash-<version>-flatjar.jar web

Just point your browser to http://127.0.0.1:9292/ and start searching logs. The web interface is called 'kibana' - you can learn more about kibana at http://kibana.org.

Kibana user interface

The most important links:

Oct 28, 2013

The Big Data Puzzle

The big data ecosystem is currently on its expansion stage: A lot of technologies are popping up but too little consolidation happens. It's hard to keep track of the big picture. Today at the Strata Conference 2013 I visited some talks and participated in some discussions which helped me to better fit together some pieces of the big data technology puzzle:

HBase
  • High performance writes
  • Poor performance queries
  • Ideal partners for data logistics: Flume, Storm, Samza 
  • Supports data updates / deletes but no SQL
  • Best used for data streams (a flow of single-entry inserts) and to store the most recent data
  • High performance bulk data loading
  • High performance bulk data reading
  • Efficient data storage (if an efficient format like Parquet is used)
  • Ideal partners for data logistics: Pig, Cascading, Spring Batch
  • Best used as an eternal memory for data
  • Can access both HBase and HDFS stored data
  • Supports a subset of SQL
  • Best used for big-in / big-out queries e.g. large joins, data enrichment
  • Best used for batch processing (low CPU usage)
  • Can access both HBase and HDFS stored data and share metadata with Hive. Can be used side-by-side to Hive to complement it without replicating data between them.
  • Supports a subset of SQL and is compatible to the Hive API (but no real drop-in replacement).
  • Not as mature as Hive but some success stories present
  • Commercial MPPs like Vertica and Teradata are faster and more mature but Impala has a tighter integration into the Hadoop ecosystem and is therefore more flexible. Most important consequence: The data has not to be replicated into Impala like it has to be in Vertica et al. Impala can directly access HDFS/HBase data.
  • Best used for big-in / small-out queries e.g. aggregations, groupings
  • Best used for realtime queries (sec-to-min)
Choreography
  • oozie: More mature and flexible. Larger set of features.
  • Azkaban: Nice and usable UI. Simpler to setup and use.
A possible outlook:

Storage & access layer
  • HDFS is and will remain the dominant virtual file system for big data.
  • The vast amount of (columnar) file formats (Parquet, HFile, RCfile, ...) will be consolidated. The beauty contest has already begun.
  • HBase will be the storage layer above HDFS for row-based access and data streams.
Query layer
  • There will be one major SQL-on-HBase/HDFS open source MPP database assembling the best of Impala, Hive, shark, ...
Choreography
  • The choreography tools will be extended with intelligent cost-based scheduling capabilities.

Sep 27, 2013

FlightRecorder in Java 7u40

Flightrecorder in JDK 7u40 - Come on Oracle


I visited two sessions at the JavaOne 2013 about the brand new flightrecorder in Java 7u40 for the hotspot JVM.

What is flightrecorder?

Flightrecorder is a sampling based profiler in combination with an event collection engine built into the JVM. Most of the code is  orginally from the JRocket virtual machine which Oracle got from the BEA/Weblogic aquisition. It is a comercial feature you have to unlock by using command line JVM startup parameters. For use in production you have to have a valid license. For development this is free to use.
The intresting thing is, that flightrecorder can run on production systems with very little overhead. The data can constantly collected in circular memory buffers and there are several policies available how to dump the data to disk which makes an offline analysis possible.
The profiler is sampling based and collects start/end times and an estimated (sampling) call count. It has a constant pool to record stack-traces for this calls very efficient. To do so, Oracle has modified the internal representation (in C) of classes and added a Class-Id to each class. Calls to their internal API (undocumented and unsupported) are build direct in the JVM. Other data which is collected by flightrecorder already exists like GC statistics or heap metrics. Yes you have read right: Oracle changes the JVM internally for support of their proprietäry products.

Oracle has build on top of that a eclipse based tool which is called "mission control". If you download an Oracle JDK 7u40 the tool is located in the jdk/bin folder.

The tool looks nice but for professional performance analysis we need much more.
  • flightrecorder is currently not open for application based events. For example: In a banking application we have to know to which bank money is transferd. Otherwise the profiler statistic says nothing because we need to know which bank causes the problem in our transaction. So events have to be enriched witch context information. 
  •   flightrecorder has this capabilities but there is currently no public API for application events or context enrichment.
  • flightrecorder has to have a remote interface to poll events. That is needed if you want to record constantly.
  • The dataformat of the dumpfile is not open nor there is any API to get the data. So you can not write you own analysis tool for a flightrecorder dump. This is needed when you want to do own statistics on the data.


Healthcenter for the IBM JVM

The IBM JVM has the same concept build in their own Virtual Machine. But they have a public API for tool developers. It enables developers to access all profiling or monitoring data remotly. So you
can write you monitoring tool which constanly consumes the data from the IBM JVM. That is currently not possible with flightrecorder and the Oracle hotspot JVM.
The businesscase of IBM seems to be based on a support model. If you have a support contract with IBM for you JVM you get the API for free.  I saw a presentation at the JavaOne where IBM presented the healthservices and their API. One thing I am absolute agree with them is, you can not write the one and only monitoring, profiling, analysis tool. Without open APIs the hole thing gets worthless.

Come on Oracle

What we really need is a public API to collect our own events and to implement our own storage policy for long term evaluation our datacenters, cluster or distributed applications.

Johannes Weigend


CompletableFuture in Java 8

Fork/Join Application Programming with CompletableFuture in Java 8


Today at the JavaOne I saw a very inspiring session about reactive programming with Java 8. The idea is quite simple. Build a pipeline of later running actions (Futures) which can run asnchron and provide synchronisation mehtods to join (combine) or split parallel actions. It will us help to parallize our code to make a application more reactive and powerful.

Why we should do so is written here: http://www.reactivemanifesto.org/

Look at some code:

final CompletableFuture<ArrayList<User>> users = 
        CompletableFuture.supplyAsync(() -> {
            //lookup all users ... this can run for a long time
            return new ArrayList<User>(Arrays.asList(new User("u1")));
        });

final CompletableFuture<ArrayList<Product>> products = 
        CompletableFuture.supplyAsync(() -> {
            //lookup all products ...  this can run for a long time
            return new ArrayList<Product>(Arrays.asList(new Product("p1")));
        });

 final CompletableFuture<String> report = users.thenCombine(products, (u, p) -> {
            return u.toString() + p.toString(); // demo here
        });

 System.out.println(report.join()); -> "{u1}{p1}"

The clue lies in the supplyAsync factory method. It returns a Future which has as parameter the return value of the parallel executed action. Here in this example we want the lookup for products and users and then combine the results to a string. 
The searching for products and the searching for users runs parallel. This class is a very useful wrapper over the ForkJoinPool of JDK 7. You can set up easily parallel workflow pipelines. 
You can also use another executor to run in a JEE server (Glassfish 4 has concurrency APIs to execute things parallel). Look at the JSR 236.

See also:
http://download.java.net/lambda/b88/docs/api/java/util/concurrent/CompletableFuture.html
http://java.dzone.com/articles/java-8-definitive-guide
https://java.net/projects/concurrency-ee-spec


Sep 26, 2013

Lambdas and Streams in JDK 8

Lambdas and Streams at the JavaOne 2013

Lambda

The new Lamda expressions in combindation with the Stream-API in Java 8 leads to the biggest change in Java Programming since the introduction of Generics in JDK 5.
Labmda Functions remove tons of boilerplate code from anonymous inner classes and make the code more compact and readable.  

The following example shows typical UI code which is needed for JavaFX in JDK 7.

 button.setOnAction(new EventHandler<ActionEvent>() {

      @Override

      public void handle(ActionEvent actionEvent) {

          ...      
      }

    });

In JDK 8 the same looks like this:

button.setOnAction(event -> {

       ...
});
The JDK has tons of new Methods even in existing interfaces like java.util.Collection which use Lambda functions. To make that possible, you can now provide a default implementations of methods in interfaces, so it is now possible to add new methods to an existing interface without breaking every implementor. The following example shows the forEach method of the java.util.Collection interface.


List<Person> persons = ...
persons.forEach(p -> p.setLastName("Doe"));

But thats all syntactic suger. But whats really makes a difference in daily programming is the new Stream API.

Streams != java.io

What is a Stream? You can think of an iterator, which can access a real collection but also think of generated data like primes or other endless sequences. 
You can filter, map, reduce, split streams and form other streams with other content or even populate new collections from a stream. Here are some examples:

1. Compute a list of adults from a collection of persons.



List<Person> greater18 = persons.stream()
        .filter(p -> p.getAge() >= 18)
        .collect(toList());

The filtering is lazy. It only executes when the stream is consumed (by the collect(toList()) method. The filter/collect methods are methods from Stream itself. It is a fluent API design.


2. Compute a set of ages of adults from a collection of persons.


Set<Integer> ages = persons.stream()
        .filter(p -> p.getAge() >= 18)
        .map(Person::getAge)
        .collect(toSet());

The new thing here is we map a Person to an Integer (the age) by referencing the getAge method. It looks a little like C++. It is legal to use existing functions as lamdas. 

3. Compute the population per age


Map<Integer, Long> population.perAge = persons.stream()
        .collect(groupingBy(Person::getAge, counting()));

The map stores the age and maps the count of people.

4. Compute the names per age


List<Person> persons = ...
Map<String, Long> population.perAge = persons.stream()
        .collect(groupingBy(Person::getAge, mapping(Person::getName, toList()));

It really makes programming different when you think of normal collection based programming. Java Programms will look like never before.

BUT: Not in every case lamdas leed to better readable and maintainable code. The following example (shown here at the JavaOne as valid example) is a abuse. 



 List<Person> pa = Arrays.asList(new Person("Peter"), new Person("Ken"));

 // a) 
 for (int i = 0; i < pa.size(); i++) {
    Person p = pa.get(i);
    if (p.getName().equals("Ken")) {
        doSomething(p);
    }
 }

 // b) Exact the same !!!
 IntStream.range(0, pa.size())
     .mapToObj(pa::get)
     .filter(p -> p.getName().equals("Ken"))
     .forEach(Main::doSomething);


The section b) is a exact port of the imperativ version with a functional syntax. That does not make sense. The example is hypothetic but if programmers are tought lamdas are cool in every way - this can be the result. 
In terms of language design it is a problem when a language allows to many different ways to express the same. C++ is such a problematic example. The complexity lead to the development of Java...

Make sure you have read the pragmatic programmer before you get lost in <<>>, -> or :: symbols. 

Johannes Weigend


Sep 25, 2013

Unlocking the Java EE Platform with HTML5 - JavaOne 2013

Unlocking the Java EE Platform with HTML5


Geertjan Wielenga, David Haefelfinger and me (Johannes Weigend) did a interessting Talk about HTML5 and JavaScript in combination with Java EE7. I will now share my thoughts here.

What I did was a typical HTML5 Application with a pure HTML/JS Client and a REST Server Backend. The Application looks like this:






This is a typical Single-Page Application with two logical Screens. After entering a search expression the logo disapears (annimated) and the result list fades in. The Architecture of this application looks like this.






As you can see on the Architecture Overview, the Client is developed with HTML5/JavaScript and CSS. The Client executes a Rest Call to a Glassfish 4 Server which is built by using JEE7 and JAXRS. On the client side I use JQuery for AJAX Interaction and DOM access. Knockout.JS for binding (The JSON result of the REST Call back to out HTML code) and bootstrap which gives me a column oriented (and responsive) layout. 

I also integrate an EJB-Session Bean to get the REST-Annotated-Code clean of business/searchlogic. This kind of Client has some benefits over a typicall JEE Java Server Faces application.
  • The Server itself can be stateless – so we can distribute free in the cloud – each call can go to a different server. That helps for scalability
  • The development is HTML5/JS/CSS for the client and pure Java on the server. We have a clean separation of concerns – and we can distribute the development easily in different teams.
  • We can use JavaScript libraries on the client – like bootstrap – where the server is PHP.

This kind of client has also some pain points
  • JS is currently not typed. So the autocomplete support is limited. Refactoring is a nightmare. Microsoft works currently on Typescipt which should make is easier for such tools. Netbeans support for typescript is on the way.
  • The current library situation is like the Wild West. Everybody claims to have the best library for application development – There are many unsolved questions if you want to build enterprise scale applications on that unsafe basis.
But nevertheless, this kind of application is a current trend. I will show you in this blog how to build this application with Netbeans 7.4 and Glassfish 4.

The video shows every step of my demo at the JavaOne.