Wednesday, December 27, 2017

[OT] Work Next Year the way Arsenal and Liverpool played their football

I've been lacking the ideas on how to write the regular and last off-topic post of the year, thinking which piece of music I should link to.

And then the inspiration came from the completely unexpected source.

Those of you who follow Arsenal FC in the English Premier League know that Arsenal can either draw against Liverpool but in such a way that fans will remember it for years (yes, that Liverpool 4 to Arsenal 4 draw), or, most likely, lose badly to this Merseyside team, something like 1:4 or similar.

So less than a week earlier, Arsenal was playing with Liverpool in London, losing 0:2. Oh well, most Arsenal fans thought, one of those days which can only be described in Fever Pitch. Then, in the 2nd half, after the few minutes, while the fans were having mince pies and tea,  Arsenal were 3:2 up, with Liverpool managing to equalize. The game saw many mistakes and brilliant moves and fans just had the day of the year watching the game. I liked this summary.

How would I translate that to a New Year wish for you, the software engineers ? Here it is:

Enjoy your work next year, try to do something extraordinary, something new, and don't be afraid to make mistakes :-)

Happy New Year !


Get OpenAPI v3 JSON with CXF Now !

The Apache CXF has done some initial work to have OpenAPI v3 JSON reported from the JAX-RS endpoints.

Andriy Redko has started with the OpenApiFeature implementation which depends on the latest OpenApi v3 aware swagger-core and swagger-jaxrs libraries and demoed it here.

In meantime I did a Swagger2 to OpenApi v3 JSON conversion filter which reacts to openapi.json queries by converting Swagger2 swagger.json produced by Swagger2Feature to openapi.json with the idea of making it easier for the existing upstream code (which has already integrated Swagger2Feature) to start experimenting with OpenAPI v3, before doing the switch to the new feature (and dependencies).  

This effort is still a work in progress but the results in both cases are promising. The new feature and the conversion filter will require some more improvements but you can start experimenting with them right now. And if you are someone like me then you will be positively surprised that SwaggerUI  3.6.1 and newer can handle both Swagger2 and OpenAPI v3 JSON :-).

Enjoy !

Simple Reuse of org.reactivestreams in CXF

I mentioned earlier that one could link RxJava2 Flowable with JAX-RS AsyncResponse with Subscriber which will do the best effort at streaming the data pieces converted to JSON array elements, see this example.

That works but requires the application code refer to both JAX-RS AsyncResponse and CXF specific JsonStreamingAsyncSubscriber (RxJava2 specific at the earlier stage), as opposed to simply returning Flowable from the resource method.

In meantime, John Ament added the initial Reactor integration code, and as part of this work John also provided the org.reactivestreams compatible JsonStreamingAsyncSubscriber to be optionally used with the CXF Reactor invoker.

As a result we've found the opportunity to do some refactoring and introduce the simple org.reactivestreams utility module which is now reused between CXF RxJava2 invoker and Reactor invoker: the common invoker code both invokers delegate to will check if JSON is expected and if yes then will register JsonStreamingAsyncSubscriber as org.reactivestreams.Subscriber with org.reactivestreams.Publisher which can be either RxJava2 Flowable or Reactor Flux.

The end result is that users can now write simpler code by returning Flowable or Flux from the service methods.

It is an interesting but simple example of reusing the org.reactivestreams aware code between different org.reactivestreams implementations.

Tuesday, September 12, 2017

The Real Data Processing with Apache Beam and Tika

If we talk about the data injestion in the big data streaming pipelines it is fair to say that in the vast majority of cases it is the files in the CSV and other text, easy to parse formats which provide the source data.

Things will become more complex when the task is to read and parse the files in the format such as PDF. One would need to create a reader/receiver capable of parsing the PDF files and feeding the content fragments (the regular text, the text found in the embedded attachments and the file metadata) into the processing pipelines. That was tricky to do right but you did it just fine.

The next morning you get a call from your team lead letting you know the customer actually needs the content injested not only from the PDF files but also from the files in a format you've never heard of before. You spend the rest of the week looking for a library which can parse such files and when you finish writing the code involving that library's not well documented API all you think of is that the weekends have arrived just in time.

On Monday your new task is to ensure that the pipelines have to be initialized from the same network folder where the files in PDF and other format will be dropped. You end up writing a frontend reader code which reads the file, checks the extension, and then chooses a more specific reader.   

Next day, when you are told that Microsoft Excel and Word documents which may or may not be zipped will have to be parsed as well, you report back asking for the holidays...

I'm sure you already know I've been preparing you for a couple of good news.

The first one is a well known fact that Apache Tika allows to write a generic code which can collect the data from the massive number of text, binary, image and video formats. One has to prepare or update the dependencies and configuration and have the same code serving the data from the variety of the data formats.

The other and main news is that Apache Beam 2.2.0-SNAPSHOT now ships a new TikaIO module (thanks to my colleague JB for reviewing and merging the PR). With Apache Beam capable of running the pipelines on top of Spark, Flink and other runners and Apache Tika taking care of various file formats, you get the most flexible data streaming system.

Do give it a try, help to improve TikaIO with new PRs, and if you are really serious about supporting a variety of the data formats in the pipelines, start planning on integrating it into your products :-)

Enjoy!



Wednesday, September 6, 2017

Mostly On Topic: CXF and Swagger Integration Keeps Getting Better

While thinking about a title of this post I thought the current title line, with the " Keeps Getting Better" finishing touch may work well; I knew I used a similar line before, and after looking through my posts I found it.

Oh dear. I'm transported back to 2008, I can see myself, 9 years younger, walking to the Iona Technologies office, completely wired on trying to stop the Jersey JAX-RS domination :-), spotting an ad of the latest  Christina Aguilera's albom on the exit from the Lansdowne Dart station and thinking, it would be fun, trying to blog about it and link to CXF, welcome to the start of the [OT] series. I'm not sure now if I'm more surprised it was actually me who did write that post or that 9 years later I'm still here, talking about CXF :-).

Let me get back to the actual subject of this post. You know CXF started quite late with embracing Swagger, and I'm still getting nervous whenever I remind myself Swagger does not support 'matrix' parameters :-). But the Swagger team has done a massive effort through the years, my CXF hat is off to them.

I'm happy to say that now Apache CXF offers one of the best Swagger2 integrations around, at the JSON only and UI levels and it just keeps getting better.

We've talked recently with Dennis Kieselhorst and one can now configure Swagger2Feature with the external properties file which can be especially handy when this feature is auto-discovered.

Just at the last minute we resolved an issue reported by a CXF user to do with accessing Swagger UI from the reverse proxies.

Finally, Freeman contributed a java2swagger Maven plugin.

Swagger 3 will be supported as soon as possible too.

Enjoy!

Thursday, August 31, 2017

Apache CXF 3.2.0 NIO Extension

In CXF 3.2.0 we have also introduced a server-side NIO extension which is based on the very first JAX-RS API prototype done by Santiago Pericas-Geertsen. The client NIO API prototype was not ready but the server one had some promising start. It was immediately implemented in CXF once a long-awaited 1st 2.1 API jar got published to Maven.

However, once the JAX-RS 2.1 group finally resumed its work and started working on finalizing NIO API, the early NIO API was unfortunately dropped (IMHO it could've stayed as an entry point, 'easy' NIO API), while the new NIO API did not materialize primarily due to the time constraints of the JCP process.

The spec leads did all they could but it was too tight for them to make it right. As sad as it was, they did the right decision, rather then do something in a hurry, better do it right at some later stage...

It was easily the major omission from the final 2.1 API. How long JAX-RS users will wait till the new JAX-RS version will get finalized with the new NIO API becoming available to them given that it takes years for major Java EE umbrella of various specs be done ?

In meantime the engineering minds in SpringBoot and RxJava and other teams will come up with some new brilliant ways of doing it. There will be not 1 but several steps ahead.

Which brings me to this point: if I were to offer a single piece of advice to Java EE process designers, I'd recommend them to make sure that the new features can be easily added after the EE release date with the minor EE releases embracing these new features to follow soon,  without waiting for N years. If it were an option then we could've seen a JAX-RS 2.2 NIO in say 6 months - just a dream at the moment, I know. The current mechanism where EE users wait for several years for some new features is out of sync with the competitive reality of the software industry and only works because of the great teams around doing EE, the EE users loyalty and the power of the term 'standard'.

Anyway, throwing away our own implementation of that NIO API prototype now gone from 2.1 API just because it immediately became the code supporting a non-standard feature was not a good idea.

It offers an easy link to the Servlet 3.1 NIO extensions from the JAX-RS code and offers the real value. Thus the code stayed and is now available for the CXF users to experiment with.

It's not very shiny but it will deliver. Seriously, if you need to have a massive InputStream copied to/from the HTTP connection with NIO and asynchronous callbacks involved, what else do you need but a simple and easy way to do it from the code ? Well, nothing can be simpler than this option for sure.

Worried a bit it is not a standard feature ? No, it is fine, doing it the CXF way is a standard :-)
  

JAX-RS 2.1 is Released

JAX-RS 2.1 (JSR 370) has been finally released and JAX-RS users can now start looking forward to experimenting with the new features very soon, with a number of final JAX-RS 2.1 implementations being already available (such as Jersey) or nearly ready to be released.

Apache CXF 3.2.0 is about to be released shortly, and all of the new JAX-RS 2.1 features have been implemented:  reactive client API extensions, client/server Server Sent Events support, returning CompletableFuture from the resource methods and other minor improvements.

As part of the 2.1 work (but also based on the CXF JIRA request) we also introduced RxJava Observable and recently - RxJava2 Flowable/Observable client and server extensions. One can use them as an alternative to using CompletableFuture  on the client or/and the server side. Note, the combination of RxJava2 Flowable with JAX-RS AsyncResponse on the server is quite cool.

The other new CXF extension which was introduced as part of the JAX-RS 2.1 work is the NIO extension, this will be a topic of the next post.

Pavel Bucek and Santiago Pericas-Geertsen were the great JAX-RS 2.1 spec leads. Andriy Redko spent a lot of his time with getting CXF 3.2.0 JAX-RS 2.1 ready.