Diego Perini's Personal Blog

My Approach To Developing Software Fast Requires Improvements (Android edition)

2016-10-23T00:00:00+00:00

I feel like it has finally arrived: being a generalist programmer/engineer slaps me in the face like it has never done before. I try my best to dodge these slaps but they keep coming, it is starting to hurt. This may sound too sentimental and I agree, this is one of those times when I need my teachers to ask whatever I want and hope they don’t tell me to go away and do some research. I need answers, not a new article or book to read. Eventually, I’ll go back to reading but right now, I may not have time to do so without consequences. The “I” in the title is actually me and my team in our startup company. Enough chitchat, let’s talk programming.

Problem: There seems to be a limit for how much one can eliminate boilerplate code

Here is the pseudo code of how we iterate with an example. It is from an Android app project written in native Java and backed by a few 3rd party dependencies.

Like many other people out there, we use Fragments. They enable instant screen transitions and code sharing in a single Activity architecture with a cost of high amount of null checks and complex life cycle handling. Why we iterate is to reduce or hide these costs without losing the benefits.

Initially,

We start a new scene by extending native Fragment
Implement default overrides
Test, crash then fix a lot until it is stable
Start another scene by extending native Fragment again
Do #2 and #3 again
Compare last two scenes to detect shared boilerplate
Extend Fragment and name it BaseFragmentV1
Transfer boilerplate there or generalize the logic
Start a new scene by extending BaseFragmentV1
Provide generic arguments if there are any
Implement newly defined default overrides (hopefully for a less amount)
Do #5
Do #9, #10, #11 and #12
Do #6
Extend BaseFragmentV1 and name it BaseFragmentV2
Go to #9 and iterate by increasing BaseFragment version on each iteration

Around fifth iteration a limit is hit. The reason is simple. No matter what you do, there is no such thing as a generic model for your data. Many models are nested and used by only one fragment. Since they are not generic, any web service client abstraction and/or custom view that manipulates them requires to be implemented by hand from scratch. Utilities indeed exist to make these implementations a plug and play game.

Not a F2P

This game has a cost for our developers. It has guidelines, best practices, worst practices, illegal usages and most importantly require of dirty hacks when UX designers invent smart widgets. The thing is, they tend to do that a lot.

That is the essential cause of this problem in the title. It is hard to avoid. From the design perspective, the requested feature is actually generic to the end user. It is easy to learn, fun to use and reduces click count per action. Quite lengthy in terms of coding though.

If there is a way to automate implementation of such use cases, I’d like to know how. Maybe we need another kind of iteration process, maybe there exist helpful 3rd party libraries that we don’t know. I wish I knew a way out of this.

Our tools

Here is our tool belt for the curious reader. We already have,

A stable main Activity that never crashes
A generic Fragment to never have to bind views by hand
Same fragment handles data refresh logic and state persistence too
A central dispatch for cache operations
A central dispatch for navigating between Fragments
A generic pop up with configurable text fields and action callbacks
A generic animation library for chaining, interrupting time and control based animations (to be made open source soon)
A generic Switch widget
A generic RootView and inflater to be able to use with Android’s data binding utilities (this is awesome actually)
One liners for HTTP client calls.
One liner threading utilities
One liner resource handling utilities
One liner generic observers
One liner date, string, math and list utilities

What we need is,

A way to reduce amount of work to write new data models. Reflection based JSON parsers/generators are not suitable.
A design pattern to express view state without modifying represented data. I don’t want to add view specific bool flags to a model. A state object that wraps the model without modifying it looks dirty but maybe a way to go. It seems dirty when a Fragment needs to persist that state instead of just the data model’s entity object.
A client generator to be able to benefit from server docs in order to generate client calls (something more lightweight than Swagger and easy to inject into our own HTTP client).
Much better Intent handling for push notifications and result expecting Activity transitions. Native Android approach looks hideous. We still use what is documented in Android’s developer guides.
A way to optimize view count in a RecyclerView cell. RelativeLayouts are all over the place right now. LinearLayouts that nest other LinearLayouts suck too. Maybe there is a way to write flat view hierarchies using FrameLayout and ConstraintLayout without losing development efficiency.
A way to reduce build times. It is around 100 seconds right now. Multidex looks unavoidable.

Final Words

I’m not sure if we hit the platform limit or it is just a temporary hassle that will eventually disappear with better tooling, extensive design pattern usage or just more experience. If someone thinks some things in our tool belt looks useful, we are more than happy to make them open source in the future.

Expect a similar post for iOS once I am a mature enough developer in that field. In the meantime, enjoy criticizing anything in this post.

Peace.

Fighting with a language is a bad idea

2015-05-27T00:00:00+00:00

I have the habit of making the same mistakes over and over again not because programming language features are badly advertised, because I get too excited when I am able to find a specific feature I like in a language. My enthusiasm clouds my judgment, then the problem with my approach strikes me at the most inconvenient time. No library fixes a problem introduced solely by bad practice.

A Small Thought Experiment

Computer engineering/science education in many schools follows a deductive approach to inspect underlying concepts behind high level abstractions. Many curricula start with an easy to grasp programming language such as Python, PHP or subsets of some Java, C#, C, C++. Then they introduce limits like memory space, I/O delays, network fault tolerance, cache misses, software scheduling. A curious student can easily find out that many programming languages try to hide these limits from its users to become applicable to certain use case scenarios.

Here is my experiment. If there was a school that taught Javascript (on Node.js or io.js without explaining what they are) to its first graders just for writing small console programs, I believe these students would drown under the heavy burden when they tried to use their new skills on a browser.

Browsers are like grumpy parents, they get angry at what you do all the time. No long running loops, functions that do I/O are void functions with callback parameters, no initial file system, HTTP as main network communication tool (which is a monster if you read its history). All these limits are legit and there for security and hardware related reasons.

So, when an algorithm fails to run on your target platform although the language allows you to try it, who is at fault here? Your browser or the language?

You are Fighting the Wrong Battle

Language inventors are not stupid. I may hate Java because of reasons but I have no doubt that its creators and maintainers are really smart and hardworking people. They wanted their language to support every platform out there and built tools to render all kinds of problems arise from these platforms irrelevant. When you cleanly hide 1000 problems for your users, you are allowed to introduce a few new ones, okay? I believe a regular user would hardly need to face with those few ones.

The struggle is pointless because the bottleneck is always the environment, not the language.

Your singleton dies not because Java static is buggy, but your Android mobile phone doesn’t persist its data on device memory all the time.

Javascript is single threaded because otherwise a browser with many open tabs would quickly deplete all the CPU time available in your pc.

Mind Your Manners

That is actually what all it is about. APIs, languages, protocols are only things that computers can understand. Be polite and don’t swear at your computer with them.

P.S. In this post, the addressee “you” is me. :)

Cheers

Client is a valid storage

2015-04-23T00:00:00+00:00

I’ve been developing backend web software for the last few years and during this small interval, it became a common task to implement database layers, user authentication, client side caching, tracking unauthenticated traffic and user metrics collection. Each time a project included such requirement, learning it through hacking was always the best approach for me. My own computer science degree, Stack Overflow answers and blog posts of people from open source communities are my main sources here. In this post, I will try to use simple examples to explain where one can store application data in a client/server infrastructure and what implications may rise from using each storage type. At the end, there is a guideline of mine I obey when I use client side storage (web and mobile) for applications that communicate through web. Reach me if you believe some statements in this post are incorrect or bad advice.

Ultimate Paranoia

I like to think that every client user is an expert hacker who can read and modify your client source code at will. Every configuration file and persistent data is stored in plain text files and none of that data is sand-boxed. In reality, mobile browsers and app storages more securely handle their data but it is best to assume the worst.

In other words, there are actually only two locations you deal as storage in terms of security. “Within reach” which means cookies (web), local storage (HTML5), session storage (HTML5), user defaults (iOS), shared preferences (Android), SQLite (iOS, Android) etc. and “beyond reach” which is your server and its databases, local files, SaaS components.

Classifying Data

We will semantically classify our data as “first class”, “second class” and group them in different ways to represent “states” and “leftovers”. I will use this RESTful sign-in response of an example social app to mark them. Eventually, we will distribute this data to our server/client storages without exposing a vulnerability in our app.

  
// A sign-in response
{
    "id": 69395395,
    "username": "john",
    "age": "19",
    "gender": "male",
    "access_token": "3e416ab380add736c1b_1000322",
    "server_time": "2015-04-23 02:25:00"
}

When the entity represented by the data is a mirror of your models attributes (a user’s profile, a post, a media content), this definition will call it first class data. These entities are exchanged when the clients request latest version of the model or they order changes in them. The values inherited from first class data are significant to the user. In our social app, you retrieve such data to fill a profile screen, allow manipulations to it via predefined inputs and restricted API calls. Let’s mark them.

  
// A sign-in response
{
    "id": 69395395,                                 // first class
    "username": "john",                             // first class
    "age": "19",                                    // first class
    "gender": "male",                               // first class
    "access_token": "3e416ab380add736c1b_1000322",
    "server_time": "2015-04-23 02:25:00"
}

Second class data exist only to identify the owner of its first class siblings. When alone, they are gibberish to the user. Once attached with additional first class data, they are used to determine targets of updates, source of retrievals or means of their owner entity.

Here they are.

  
// A sign-in response
{
    "id": 69395395,                                 // first and second class
    "username": "john",                             // first class (also a second class candidate if usernames are unique)
    "age": 19,                                      // first class
    "gender": "male",                               // first class
    "access_token": "3e416ab380add736c1b_1000322",  // second class
    "server_time": "2015-04-23 02:25:00"            // second class
}

It is possible for a data to be classified by more than one kind. John’s id is both first and second class data as without it, John’s model would not be complete.

Gathering Data Together

A state is a collection of at least one second class data with zero or more first class data. We used second class data to keep secure the validity of some state. We can’t trigger and get status updates for “john 19 male” without knowing which “john” we are dealing with. John cannot update his profile without proving that he really is “john#69395365”. If it is almost new years eve in the country where our server is located (let’s assume US), then John now can be 20 if he lives somewhere else (i.e Australia).

There can be other collections that only consist of first class data. We define such collections as leftovers since they lose their tie to the model without a second class sibling.

Our effort is to store as much data we can on the relevant side while maintaining minimum number of state entries on our server. We will hand-pick some collections to introduce both security and better user experience.

Easiest ones to pick are collections that cover their model entirely and their model only.

  
// John's model (a state)
{
    "id": 69395395,         // first and second class
    "username": "john",     // first class
    "age": 19,              // first class
    "gender": "male"        // first class
}

This collection is all John is, whoever asks for John will need these values to identify him. Since nobody should wait for John to be online all the time, this collection needs to be stored on the server for real time retrieval.

When John fills input boxes to sign-in, he provides a way for our app to remember his name on the same device in the future. Let’s pick a collection to make his next visit an easier one.

  
// Autocomplete username (a leftover)
{
    "username": "john"      // first class
}

We can safely store this collection in our client to automatically fill username field in our sign-in box.

John has a personal computer that only he can access, therefore wants to be kept signed-in by our app all the time. He is not comfortable with the fact that his password is remembered by his browser and we want to make sure he feels safe. We need a replacement for his password that will help us identify him on his each request. We call it an access token.

One creates an access token by digitally signing an authentication request of a user. When John provided his username and password to sign in, he also sends additional info that is special to his device. The more additional data he sends, the harder it becomes for an attacker to mimic his device (even if his access token is exposed). We can use his IP, browser name, OS name, sign-in date, device id (if exists) to generate an access token which can expire after some time, on network change, on different browsers and on a different operating system. Once I finalize this post, I will add my answer on Stack Overflow as an appendix entry for an example token generation algorithm.

We choose John’s access token alone as our collection and store it in our client.

  
// John's token (a state)
{
    "access_token": "3e416ab380add736c1b_1000322"       // second class
}

John is the only person who can modify his profile. In other words, as long he doesn’t modify it, refreshing his profile page or accessing partial info from his profile shouldn’t trigger an HTTP request. It will reduce network data consumption of his device and ease some burden on our server. Also his profile will be ready all the time. He may be using multiple clients at the same time so we may want to invalidate our cached profile after some time.

  
// John's profile (a state)
{
    "username": "john",                                 // first class
    "age": 19,                                          // first class
    "gender": "male",                                   // first class
    "server_time": "2015-04-23 02:25:00"                // second class
}

Here is a useful cache entry for our client. We stored server time to later decide if the cached data is too old.

Security Guideline for Storages

We can implement many more features for John. The good thing is, as long as we are persistent with this methodology, it is always apparent where to put our collections. There is a pattern here. We are now ready for the guideline I mentioned before.

On preparing first class data:

Define all accepted and unaccepted values on server and implement proper input validation both on your server and client. Nobody can be -1 years old after all.
Some first class data never require a storage. Try to produce them on server by processing existing data first (i.e counting people in a friend list).

On preparing second class data:

Only generate them on server. Your generation algorithm must be hidden to your client. You can sign, encrypt and/or obfuscate them if necessary. (see appendix to learn how)
Validate them on every retrieval on your server (see appendix to learn how).

On choosing collections:

Choose the smallest state (a collection with one second class data) that can represent your model alone and store it on your server’s most persistent storage (i.e a database).
For each feature that serves to your user, store a state (a collection with one second class data) on client.
Leftovers (collections without second class data) can be stored on client without hesitation. Never send them back to server though.
If your client state is identical to your server state, your client’s second class data exposes a vulnerability. Replace your client state’s second class data with a new one generated by using the one in the server as input. (see appendix to learn how)
A state can only be valid if all its second class data can be validated by your server.

Appendix

How to generate and validate an authentication token in PHP

I'd rather do it for myself - Part 1

2014-11-25T00:00:00+00:00

Since this also stands to be the very first post of this blog, I wanted it to be about me instead of some random revolutionary idea about programming that is rather difficult to implement but extremely useful in cases where my ego has the top priority over other solutions.

Dear readers (and future me),

Please be aware that whatever written in this post (including the upcoming ones) must be taken with a grain of salt. Ideas are almost always subjective in nature before they are proven true by science. I do not intend to provide proofs of any kind, nor do I require them to express my feelings and opinions. I tend to prove myself wrong in yearly intervals which points to progress I guess. I trust and thank to your skepticism. This blog will kind of depend on that.

Let’s set our clocks back a few years in order to clearly inspect the beginning of this “I’d Rather Do It For Myself” principle.

I don’t think I like competitions when they don’t test both skills and hard work at the same time. Skills are hard to discover. You can’t basically know that you are good at something before trying to do it for the first time. The fundamental approach of this discovery process comes with the fact that required amount of these trials may often be quite large. Boredom is a powerful obstacle, even if you manage to find the thing you are looking for. It is not possible to make a profession out of a skill before working hard on it. This is a spoiler I luckily managed to deduce by simply, reading news. Exceptional people don’t become news before making something ground breaking. The more readings/watchings I do, the more I find out that each one of these people initially spent their time with activities that are extremely boring.

Cool! Let’s put that into the “TODO LIST of my life” as first entry.

“Spot the least boring skill of yours and learn not to get bored when you work on it.”

To be continued…