Thursday, December 13, 2012

What does it take to be the Technical Assistant (TA) to the ThoughtWorks Technology Advisory Board?



Jeff at TAB meeting © Martin Fowler
We have a Technology Advisory Board (TAB) to help us keep a pulse on technology throughout our very diverse organization.  The TAB is a group of 21 senior technologists from all over the world that meets over the phone every two weeks and in person twice a year. This group is very busy, because the members are essential to our success in their own regions.  They are the ones that make sure our new projects are technically interesting, keep our projects on track, and fix them when they fall off the track. 

To keep the TAB running smoothly, we have a Technical Assistant (TA).  Like the other members of our TAB, the TA is a technologist.  The TA is an ex-officio member of the TAB that has five primary responsibilities. I have been the TA for the TAB for the last two and a half years. 

The first responsibility of the TA is sharing.  There are two major aspects to sharing:
  • One of the most visible outputs from the TAB and possibly ThoughtWorks as a whole is the Technology Radar.  The primary responsibility for collecting content, writing, and editing the radar belongs to the TA.  For more detail read how the technology radar is created.
  • ThoughtWorks is a very flat organization, and the TAB works in the open.  The notes for our meetings are available to anyone from our organization who is interested.  The TA is responsible for taking meticulous notes during our meetings, summarizing them, and publishing them. 
The second responsibility of the TA is connecting.  This entails a few different things:
  • Collect topics for our meetings, create our agenda with Rebecca Parsons, our CTO, and keep track of things that we will want to come back to later. 
  • Plan meetings, taking into account the availability of our consultants and the dramatic time differences in regions that spread from China to the east coast of the United States.
  • Arrange the logistics for our meetings, such as arranging space to meet, making sure we have the right supplies (we love our sharpies and post-it notes), ordering food, and setting up conference calls.
The third responsibility of the TA is facilitation.  Keeping meetings running smoothly with 21 very vocal technologists can be challenging.  The TA is responsible for facilitating meetings to make sure that they are effective and that everyone, including quieter members have a chance to speak.

The fourth responsibility of the TA is advocacy.  There are two aspects to advocacy:
  • The TA is a representative for the TAB in their region, making sure that the work of the TAB is visible and taken advantage of.  This involves presenting at regional events as well as informal conversations and written communication.
  • The TA is also a representative for their region in the TAB.  They are responsible for keeping a pulse of what is happening in their region and making sure that the important issues are covered by the TAB.
The final responsibility of the TA is contribution.  As an ex-officio member of the TAB, the TA has the opportunity and responsibility to help shape our technology direction. 

Tuesday, October 23, 2012

How the ThoughtWorks Technology Radar is created


ThoughtWorks creates the Technology Radar to help champion the cause of software excellence and revolutionize the IT industry. Because we have lofty goals, this is an intentionally opinionated document that is meant to challenge us all to improve.

The radar is created by the Technology Advisory Board (TAB). It is a group of 21 technologists from around the world that includes Martin Fowler -our chief scientist, Rebecca Parsons - our CTO, Neal Ford - a prolific author and speaker, and Pramod Sadalage –author of NOSQL distilled and Refactoring Databases. This group brings together a cross-section of technologists with very different interests and focuses. We meet in person every 6 months to focus on technology strategy for our company and the IT industry as a whole. The radar is one of ways that we share that strategy.

To prepare for these face-to-face meetings, our consultants collect feedback from their regions about the most interesting techniques, tools, platforms, frameworks, and languages. For example, to collect feedback from our Chicago office, Wendy Istvanick, Pramod Sadalage, and I ran a Chicago technology radar session. The session involved:
  • About 50 consultants
  • Two large walls filled with post-it notes with items that people were passionate about
  • A buffet of really good Mediterranean food
  • 1.5 hours of heated discussions about topics ranging from problems with Rails to emerging cloud providers.


The feedback from these regional meetings and the interests of the members of the TAB are refined at our face-to-face meeting by:
Radar brainstorming session
  • Brainstorming hundreds of items that we are interested in.
  • Discussing the relative importance of these items and how they affect different parts of the world.
  • Choosing the most important items.
  • Deciding how to advise our consultants and the industry as a whole about these items.

As you can imagine, those are some heated discussions, because we have 20 technologists working on very different projects with very different constraints.

Rough radar graphic

After the face-to-face meeting, these discussions are transformed into the document you know as the Technology Radar.
  • We start by picking out items that each member is most knowledgeable about.
  • They do some research to deepen their understanding. This research usually involves discussions with other ThoughtWorkers and our friends in the industry.
  • They write up a paragraph that captures their thoughts about each item.
  • We sort the items and rewrite the descriptions to have a uniform voice and appropriate length.
  • We work with our graphic designer to make the document look professional.


Because of ThoughtWork’s flat organization structure and open culture, we are a group that works out in the open. For us that means our discussions are held on mailing lists that are open to any ThoughtWorker. Early drafts of the radar are shared and we actively encourage discussion and even disagreement from everyone. 

Friday, March 4, 2011

Learning through games

Playing games is a great way to learn something new. It is also a great way to take something that you know intellectually and turn it into something that you feel instinctively. Sometimes you think you are just playing a game, but in reality, you can learn something from everything you try.

My son, Jeffrey, never wanted to learn how to read. He had all the skills for it, he could recognize all the letters and he knew what the sounds of the letters made, but he would never actually sit down and read anything. He said that he couldn’t. That was until we bought the game Kingdom Hearts Re:coded for his Nintendo DS. He really wanted to play and since it is a role playing game, there was a lot of reading needed just to understand what was happening. Initially, he needed to sit with me or his grandma and ask us what most of the words were. As we played more, he started reading more and more of the words on his own. By the time we finished the game he could read anything meant for kids his age.

A few weeks ago, at the Cleveland Agile User’s group, we had a games night. We played a game called “We’re having a party” We had about 35 adults all excited about making invitations for a party that won’t actually happen. We were all rushing seeing how many invitations we could make before the timer ran out, then we were brainstorming about how we could make more invitations next time. While we were having fun, we picked up some important lessons about lean software development, cycle time, and defect prevention.

When you are trying to introduce a new behavior into your team, try to make it into a game.

Tuesday, April 28, 2009

What comes from a 10 year agile project?

Not very many people can claim to be on a 10 year agile project. I have the privileged of being the tech lead for ThoughtWorks' longest running project. About 10 years ago, we started working with our client to build a leasing application that would revolutionize their industry, giving them the ability to grow quickly in boom times and survive through lean times. After 10 years and more than 25 major releases, we continue to do new development and have fun doing it.

This project has been the home of some interesting innovations, including CruiseControl. My project was one of two large projects from ThoughtWorks that started CruiseControl and open sourced it. We think we may have the longest running instance on CruiseControl in the world.

We also have a lot to talk about. The podcast Keeping Grey Code Fit - Perspectives on Mature Projects highlights just a few of our stories.

Even Martin Fowler has chimed in on this project. He refers to us in his blog as the Nashville Project.

Monday, March 23, 2009

Make sure to re-check your assumptions when performance tuning

We found the memory leak. This is what our properly tuned system looks like during a 66 hour test. Notice that the black line does not slowly trend upward, like it did in the last post.


The velocity templating framework that we use for generating xml to send to a third party system was slowly eating away at our memory. It wasn't really a problem with velocity, it was a problem with the way that we had our system tuned.

The first thing to remember is that Java splits up its memory into three parts, called generations. They are Young, Tenured, and Permanent.


Whenever you create a new instance of a class, it is created in the Young generation. This generation is used and quickly discarded each time there is a garbage collection. When the young generation is full, the garbage collector goes through the young generation and copies all the objects that are still referenced to the tenured generation. The place where you find memory leaks in most applications is when you keep references to too many objects. You will start getting out of memory exceptions when the garbage collector starts to copy objects from the Young generation to the Tenured generation, and there is not enough space to fit all of the objects that are still referenced. Frequently, you don't even realize that you are keeping a reference to those objects.


What made this leak particularly difficult to detect was that it was a problem in the permanent space. This area of memory is used to store objects that don't relate to instances of objects. This is where the JVM stores definitions of classes and methods. This area is pretty obscure, and the commercial tool that I used for profiling didn't even notice the leak in that area.

Velocity was creating dynamic classes, which are used once per call, then promptly discarded. These classes are created in the Permanent generation. The problem came from an optimization that we made about 5-7 years ago to speed up the server. We had set the flag noclassgc , which disables garbage collection for class definitions. This option makes sense if you never create dynamic classes, because you don't have to spend time reloading classes into memory.

We were creating classes that would never be used again, and telling the garbage collector not to bother with cleaning up after us. The definition of the classes are very small, which is why it took about 66 hours for the leak to take down the server.

The moral of the story is to revisit your performance optimizations occasionally and make sure they still make sense. The best decision today will not necessarily be the best decision tomorrow, so be ready to change your mind.

Wednesday, March 18, 2009

What takes 125 users 66 hours to discover?

Unfortunately, I think the answer is a very slow memory leak.


For our application, we typically run 3 classes of performance test. Our quick test with 100 users for 1 hour, our longer test with 125 users for 8 hours, and our really long test with 125 users for 66 hours.

The 1 and 8 hour tests were fine this time, but looking at the garbage collection logs and zooming out over a very long time period, we see that the application is holding on to memory. The blue area represents the memory that is used and released. The black area, that is slowing moving upwards, is the memory that is used and not released.

The last performance problem that we fixed was related to xml parsing and xpath. This time it looks like we need to tune Velocity.

Friday, December 5, 2008

Is is a memory leak or a performance problem?

We are about to start user acceptance testing for the next release of our project. The application passed the performance tests in our environment, but failed at our client site. Time to panic?

It doesn't have to be. This is how I analyzed our problem to determine if it was a memory leak or performance problem.

First some context:
The application is a Java J2EE web application running on Weblogic and integrating with about a dozen external systems through a variety of middleware platforms. The most dramatic change for this release involves rewriting one of those integrations to support a new and dramatically different version of an external system. This integration is very complicated and heavy on the xml.

So now I know there is a problem, but I don't know how to reproduce it in our environment. This kind of problem could easily be caused by a memory leak. The easiest way to tell if there is a leak is to look at the garbage collection logs.

The gc.log looks something like this:

[INFO ][memory ] 20.407: parallel nursery GC 378204K->202844K (2097152K), 220.715 ms
[INFO ][memory ] 24.512: parallel nursery GC 463356K->266172K (2097152K), 108.514 ms
[INFO ][memory ] 158.352: parallel nursery GC 594933K->397813K (2097152K), 223.864 ms
[INFO ][memory ] 354.614: parallel nursery GC 710277K->554373K (2097152K), 297.761 ms
[INFO ][memory ] 356.632: parallel nursery GC 918935K->772951K (2097152K), 246.615 ms
[INFO ][memory ] 359.976: parallel nursery GC 1121408K->1000192K (2097152K), 254.630 ms
[INFO ][memory ] 362.469: parallel nursery GC 1372915K->1239347K (2097152K), 245.692 ms
[INFO ][memory ] 366.508: parallel nursery GC 1595123K->1468403K (2097152K), 249.064 ms
[INFO ][memory ] 368.933: parallel nursery GC 1824064K->1698240K (2097152K), 240.599 ms
[INFO ][memory ] 371.724: parallel nursery GC 2047811K->2097152K (2097152K), 229.808 ms


While you could look through this file and try to analyze it in your head, I wouldn't recommend it. This is where a visualization tool comes in really handy. I suggest GCViewer (Thanks Andy)

  • First delete any lines off the top of the file that don't look like the lines I pasted above. They are interesting to read but will confuse GCViewer.
  • Then open the edited gc.log using GCViewer and find a section of the log where the system performing badly. Timestamps are useful for this.
The log will look something like:



The important things to look at in the graph are:
  • The saw tooth lines up. Each "tooth" where the memory usage drops a little represents a partial garbage collection. These partial garbage collections only grab the free memory that is easily available. These collections are fast, but not very effective.
  • The dramatic drops represent a full garbage collection. When you run out of memory, your system will do a full garbage collection, which should free up all the space that is available. This garbage collection is much slower, but is much more effective.
  • The gray box indicates a system pause for garbage collection. If you see a large gray box like the one on the right of the graph, this is a noticeable pause.
  • The amount of memory that is available after a full garbage collection remains approximately constant. This indicates that we are using memory quickly, but releasing it when we are done. If the amount of used memory was increasing after a full collection, that would indicate a memory leak. A leak happens when you hold on to a memory reference when you are done processing instead of releasing it.
Based on the analysis of this graph, I knew that there was not a memory leak, but instead that the system was using too much memory, too quickly. This was causing the JVM to do a full garbage collection every 15-20 seconds, and taking up to 2 seconds for a full collection.

In my next post, I'll talk about how I figured out what was using too much memory and how we fixed it.