Java Tuning

Tuesday, July 4, 2017

Faster Maven builds - Put your CPU to real work

Whenever a process takes too long I lose my concentration and drifts elsewhere. It could be a long compilation, or a long running test. Things should complete shorter.
Here's a great post on making your maven go faster. I especially liked the "-T 1C" multi-threaded switch as I like knowing that my well paid for CPUs cores are kept busy.

My unscientific benchmark for a mvn clean install on a big monolith, w/o running tests:
Without "-T 1C" - Total time: 06:02 min
With "-T 1C" - Total time: 03:15 min

For example:
mvn -T 1C clean install -f parent/pom.xml -DskipTests=true -Duser.timezone=UTC

Go fast!

Monday, October 17, 2016

Unreliable affected-rows with conditional upserts in MySQL

This post is about my attempts with affected-rows for MySQL conditional Upserts to be unreliable.

UPSERTs are tempting, with perks like single round trip to DB, atomic properties, and simpler SQL client code.
Then, if your update depends on the existing row data, there's even Conditional UPSERT.
A conditional UPSERT can have one of three outcomes:

Row is inserted as a new row
An existing row/column is updated (condition evaluated True).
Existing row/column wasn't updated (condition evaluated False).

The SQL client can tell the UPSERT outcome with the Affected-rows value.

For non-conditional UPSERTs the affected-rows value per row is 1 if the row is inserted as a new row, 2 if an existing row is updated, and 0 if an existing row is set to its current values.

For conditional UPSERTs I saw non consistent results. affected rows was 2 OR 3, when an existing row was updated (condition true), 2 when an existing row wasn't updated. 1 for new row is inserted.
Such inconsistent behavior could easily cause bugs. I didn't find a clear pattern of when it happens. I also didn't find proper documentation of what should is the expected affected-row value.
Since the ON DUPLICATED KEY UPDATE uses an IF statement I suspect it loses track of the results of the IF statement. Also each column's condition can evaluated differently than others columns conditions.

Conclusion

without knowing what to expect of the # of affected rows, and UPSERTs being easily breakable in nature when switching columns update order). I decided to ditch them completely, and ended up implementing the conditional logic in Java client side, using more than one SQL command wrapped in a transaction.

My conditional UPSERT (expect inconsistent affected rows value):

INSERT INTO account_last_touch
(service_id, account_name, user_name, touch_time)
VALUES('123', '456', 'u1', '2016-07-19 12:11:15')
ON DUPLICATE KEY UPDATE
user_name =IF('2016-07-23 12:11:15'>touch_time, 'u1', user_name),
touch_time = IF('2016-07-24 12:11:15'>touch_time, '2016-07-24 12:11:15', touch_time)

Tuesday, January 12, 2016

Is String.hashcode() unique enough?

Given a set of unique Strings is their set of String.hashcode() values unique enough?
Well... it depends on what you define as enough.
In my case below it was enough. Read how I assessed it.

It's clear that different inputs might map to the same hashcode value (2^32 different options), but what are the chances for it to happen?
I have one million users, each user owns 50 private items. An item is identified by a UUID.
I had these two conflicting goals:
(I) Represent each item as an integer instead of a UUID
(II) Avoid collisions. Any pair of items owned by the same user should resolve to a different hashcode.

What is enough: I could live with up to 10 users, out of a million, experiencing a collision. Most of these 10 users will never notice the collision. I assume the system will have other bugs with higher probably than that.

We're all unique

Assessing uniqueness

One way to asses is computing the statistical probability for such an event . But I preferred a "proof" that any programmer could appreciate even those without good statistics skills. Therefore I coded a simulation that simply ties it in practice:

Download from Gist

package collisions.test;

import java.util.HashSet;
import java.util.Set;
import java.util.UUID;

public class UUIDToHashcodeUniquenessTestMain {

 private final static int num_of_users = 1000 * 1000;
 private final static int num_of_stacks = 50;

 public static void main(String[] args) {
  int collisions = 0;
  for (int i = 0; i < num_of_users; i++) {
   collisions += calcCollisionsForUser();
  }
  System.out.println("Had " + collisions + " collisions for " + num_of_users + " users");
 }

 private static int calcCollisionsForUser() {
  int collisions = 0;
  Set<Integer> uuidSet = new HashSet<Integer>(num_of_stacks * 2);
  for (int i = 0; i < num_of_stacks; i++) {
   String uuid = UUID.randomUUID().toString();
   Integer uuidHashcode = uuid.hashCode();
   if (uuidSet.contains(uuidHashcode)) {
    collisions++;
   }
   uuidSet.add(uuidHashcode);
  }
  return collisions;
 }
}

The program comes back saying that a collisions aren't really something to worry about:

Iteration 0: Had 0 collisions for 1000000 users
Iteration 1: Had 0 collisions for 1000000 users
Iteration 2: Had 0 collisions for 1000000 users
Iteration 3: Had 0 collisions for 1000000 users
Iteration 4: Had 0 collisions for 1000000 users

Monday, June 15, 2015

Yet Another Data Scientist - Switching to new blog

I'm switching to write a new blog Yet Another Data Scientist that focuses on Data Science, Information retrieval, Recommender systesm, Lucene, Solr, NLP, and text mining.

Thursday, March 19, 2015

Great Hebrew podcast on startups, entrepreneurship, working lean and measuring success

Useful even if you're developing software in a big enterprise like me.

http://www.shavua.net/

Thursday, October 31, 2013

A unit test to enforce max heap when running Android UT on the PC

2013's mobile devices come with 1-2GB RAM, yet Android still enforces a very small heap size of 24MB-64MB only (though it keeps increasing with time).

It's pretty easy to write an Android app that drains the heap. For example: Caching images w/o an LRU cache, reading whole files into memory instead of working with streams.

-- Your code will always use up as much memory as the system has (My spin on Parkinson's law).

I'm developing an Android app with a big UT suite that I run on Eclipse in the PC. I noticed that my default heap size is 256MB, huge compared to mobile, meaning my tests could pass on the PC, but still cause an OOME on an actual device.

So, I created MobileLikeSmallHeapDuringTestsEnforce, a new unit test to enforce a small heap size during Junit tests execution. Just make sure you throw it in to any test project you have and you're safe.

Created as a GitHubGist, you're welcome to make it better:

Tuesday, January 15, 2013

Quickest way for a one-off XML sort - OR - Learning to keep the heavytools in the shed

What's the quickest way/tool to sort a 1000 entries xml file?

Your requirements: The xml is parked on your desktop. You only need to sort it just once so you can manually examine it. Sort by the tag "relevance:score".

How would you go about it? Would you:

A) Craft a pipe stream of shell utils?

B) Use the heavy tools - a Java main() that with uses JDom?

C) Refresh the XSLT skills you never had?

D) Try your luck with a Python script?

E) search for an online XML editor tool?

F) Or, my pick at the bottom.

Example xml document to sort:

<feed>
    <entry>
        <title>Bibi</title>
        <score>0.21000001</score>
    </entry>
    <entry>
        <title>Lapid</title>
        <score>0.42000002</score>
    </entry>
    <entry>
        <title>Yechimovich</title>
        <score>0.235</score>
    </entry>
    <!--- 997 more entries -->
</feed>

My Pick: considered all the above but it sounded like a headache for a simple sort operation. So I've .... thrown the file at MS Excel, turns out it can digest it rather well, and then I sorted by the score column. Yes! Surprising. But crappy MS Excel did the job (the original schema had more nesting than the document in the example above).

Life-saver lesson: Spend time picking the right tool for the job, than on the job itself.


One click to sort