Handling Time Zone in JavaScript

Recently, I worked on a task of adding a time zone feature to the TOAST UI Calendar, the JavaScript calendar library managed by my team. I pretty well knew that the time zone support in JavaScript is quite poor, but hoped that abstracting existing data objects would easily resolve many problems.

However, my hope was false, and I found it really hard to handle time zone in JavaScript as I progressed more. Implementing time zone features beyond simple formatting of time and calculating time data with complex operations (e.g. calendar) was a truly daunting task. For this reason, I had a valuable and thrilling experience of solving a problem leading to cause more problems.

The purpose of this article is to discuss the issues and solutions related to the implementation of time zone features using JavaScript. As I was writing this rather lengthy article, I suddenly realized that the root of my problem lied in my poor understanding of the time zone domain. In this light, I will first discuss the definition and standards related to time zone in detail, and then talk about JavaScript.

What is Time zone?

A time zone is a region that follows a uniform local time which is legally stated by the country. It’s common for many countries to have its unique time zone, and some large countries, such as the USA or Canada, even have multiple time zones. Interestingly, even though China is large enough to have multi time zones, she uses only one time zone. This sometimes results in such an awkward situation where the sun rises around 10:00 AM in the western part of China

GMT, UTC, and Offset

GMT

The Korean local time is normally GMT +09:00. GMT is an abbreviation for Greenwich Mean Time, which is the clock time at the Royal Observatory in Greenwich, U.K. located at longitude 0. The GMT system began spreading in Feb. 5, 1925 and became the world time standard until Jan. 1, 1972.

UTC

Many consider GMT and UTC the same thing, and the two are used interchangeably in many cases, but they are actually different. UTC was established in 1972 to compensate for the slowing problem of the Earth’s rotation. This time system is based on International Atomic Time, which uses the cesium atomic frequency to set the time standard. In other words, UTC is the more accurate replacement system of GMT. Although the actual time difference between the two is tiny, UTC is whatsoever the more accurate choice for software developers.

When the system was still in development, anglophones wanted to name the system CUT (Coordinated Universal Time) and francophones wanted to name it TUC (Temps Universal Coordonn). However, none of the either side won the fight, so they came to an agreement of using UTC instead, as it contained all the essential letters (C, T, and U).

Offset

+09:00 in UTC+09:00 means the local time is 9 hours ahead than the UTC standard time. This means that it’s 09:00 PM in Korea when it’s 12:00 PM in a UTC region. The difference of time between UTC standard time and the local time is called “offset”, which is expressed in this way: +09:00-03:00, etc.

It’s common that countries name their time zone using their own unique names. For example, the time zone of Korea is called KST (Korea Standard Time), and has a certain offset value which is expressed as KST = UTC+09:00. However, the +09:00 offset is also used by not only Korea but also Japan, Indonesia, and many others, which means the relation between offsets and time zone names are not 1:1 but 1:N. The list of countries in the +09:00 offset can be found in UTC+09:00.

Some offsets are not strictly on hourly basis. For example, North Korea uses +08:30 as their standard time while Australia uses +08:45 or +09:30 depending on the region.

The entire list of UTC offsets and their names can be found in List of UTC Time offsets.

Time zone !== offset?

As I mentioned earlier, we use the names of time zones (KST, JST) interchangeably with offset without distinguishing them. But it’s not right to treat the time and offset of a certain region the same for the following reasons:

Summer Time (DST)

Although this term might be unfamiliar to some countries, a lot of countries in the world adopted summer time. “Summer time” is a term mostly used in the U.K. and other European countries. Internationally, it is normally called Daylight Saving Time (DST). It means advancing clocks to one hour ahead of standard time during summer time.

For example, California in the USA uses PST (Pacific Standard Time) during winter time and use PDT (Pacific Daylight Time, UTC-07:00) during summer time. The regions that uses the two time zones are collectively called Pacific Time (PT), and this name is adopted by many regions of the USA and Canada.

Then the next question is exactly when the summer begins and ends. In fact, the start and end dates of DST are all different, varying country by country. For example, in the U.S.A and Canada, DST used to be from the first Sunday of April at 02:00 AM to the last Sunday of October at 12:00 AM until 2006, but since 2007, DST has begun on the second Sunday of March at 02:00 AM till the first Sunday of November at 02:00 AM. In Europe, summer time is uniformly applied across the countries, while DST is applied progressively to each time zone in the states.

Does Time Zone Changes?

As I briefly mentioned earlier, each country has its own right to determine which time zone to use, which means its time zone can be changed due to any political and/or economic reasons. For example, in the states, the period of DST was changed in 2007 because President George Bush signed the energy policy in 2005. Egypt and Russia used to use DST, but they ceased to use it since 2011.

In some cases, a country can change not only its DST but also its standard time. For example, Samoa used to use the UTC-10:00 offset, but later changed to the UTC+14:00 offset to reduce the losses in trading caused by the time difference between Samoa and Australia & New Zealand. This decision caused the country to miss the whole day of Dec. 30, 2011 and it made to newspapers all over the world.

Netherlands used to use +0:19:32.13 offset, which is unnecessarily accurate since 1909, but changed it to +00:20 offset in 1937, and then changed again to +01:00 offset in 1940, sticking to it so far.

Time Zone 1 : Offset N

To summarize, a time zone can have one or more offsets. Which offset a country will use as its standard time at a certain moment can vary due to political and/or economic reasons.

This is not a big issue in everyday life, but it is when trying to systematize it based on rules. Let’s imagine that you want to set a standard time for your smartphone using an offset. If you live in a DST-applied region, your smartphone time should be adjusted whenever DST starts and ends. In this case, you would need a concept that brings standard time and DST together into one time zone (e.g. Pacific Time).

But this cannot be implemented with just a couple of simple rules. For example, as the states changed the dates DST starts and ends in 2007, May 31, 2006 should use PDT (-07:00) as the standard time while Mar 31, 2007 should use PST (-08:00) as the standard time. This means that to refer to a specific time zone, you must know all historical data of the standard time zones or the point in time when DST rules were changed.

You can’t simply say, “New York’s time zone is PST (-08:00).” You must be more specific by saying, for instance, “New York’s current time zone is PST.” However, we need a more accurate expression for the sake of the system implementation. Forget the word “time zone”. You need to say, “New York is currently using PST as its standard time”.

Then what should we use other than offset to designate the time zone of a specific region? The answer is the name of the region. To be more specific, you should group regions where the changes in DST or standard time zone has been uniformly applied into one time zone and refer to it as appropriate. You might be able to use names like PT (Pacific Time), but such term only combines the current standard time and its DST, not necessarily all the historical changes. Furthermore, since PT is currently used only in the USA and Canada, you need more well established standards from trusted organizations in order to use software universally.

IANA Time Zone Database

To tell you the truth, time zones are more of a database rather than a collection of rules because they must contain all relevant historical changes. There are several standard database designed to handle the time zone issues, and the most frequently used one is IANA Time Zone Database. Usually called tz database (or tzdata), IANA Timezone Database contains the historical data of local standard time around the globe and DST changes. This database is organized to contain all historical data currently verifiable to ensure the accuracy of time since the Unix time (1970.01/01 00:00:00). Although it also has data before 1970, the accuracy is not guaranteed.

The naming convention follows the Area/Location rule. Area usually refers to the name of a continent or an ocean (Asia, America, Pacific) while Location the name of major cities such as Seoul and New York rather than the name of countries (This is because the lifespan of a country is far shorter than that of a city). For example, the time zone of Korea is Asia/Seoul and that of Japan is Asia/Tokyo. Although the two countries share the same UTC+09:00, both countries have different histories regarding time zone. That is why the two countries are handled using separate time zones.

IANA Time Zone Database is managed by numerous communities of developers and historians. Newly found historical facts and governmental policies are updated right away to the database, making it the most reliable source. Furthermore, many UNIX-based OSs, including Linux and macOS, and popular programming languages, including Java and PHP, internally use this database.

Note that Windows is not in the above support list. It’s because Windows uses its own database called Microsoft Time Zone Database. However, this database does not accurately reflect historical changes and managed only by Microsoft. Therefore, it is less accurate and reliable than IANA.

JavaScript and IANA Time Zone Database

As I briefly mentioned earlier, the time zone feature of JavaScript is quite poor. Since it follows the time zone of the region by default (to be more specific, the time zone selected at the time of the OS installation), there is no way to change it to a new time zone. Also, its specifications for database standard are not even clear, which you will notice if you take a close look at the specification for ES2015. Only a couple of vague declarations are stated regarding local time zone and DST availability. For instance, DST is defined as follows: ECMAScript 2015 — Daylight Saving Time Adjustment

An implementation dependent algorithm using best available information on time zones to determine the local daylight saving time adjustment DaylightSavingTA(t), measured in milliseconds. An implementation of ECMAScript is expected to make its best effort to determine the local daylight saving time adjustment.

It looks like it is simply saying, “Hey, guys, give it a try and do your best to make it work.” This leaves a compatibility problem across browser vendors as well. You might think “That’s sloppy!”, but then you will notice another line right below:

NOTE : It is recommended that implementations use the time zone information of the IANA Time Zone Database http://www.iana.org/time-zones/.

Yes. The ECMA specifications toss the ball to you with this simple recommendation for IANA Time Zone Database, and JavaScript has no specific standard database prepared for you. As a result, different browsers use their own time zone operations for time zone calculation, and they are often not compatible with one another. ECMA specifications later added an option to use IANA time zone in ECMA-402 Intl.DateTimeFormat for international API. However, this option is still far less reliable than that for other programming languages.

Time Zone in Server-Client Environment

We will assume a simple scenario in which time zone must be considered. Let’s say we’re going to develop a simple calendar app that will handle time information. When a user enters date and time in the field on the register page in the client environment, the data is transferred to the server and stored in the DB. Then the client receives the registered schedule data from the server to displays it on screen.

There is something to consider here though. What if some of the clients accessing the server are in different time zones? A schedule registered for Mar 11, 2017 11:30 AM in Seoul must be displayed as Mar 10, 2017 09:30 PM when the schedule is looked up in New York. For the server to support clients from various time zones, the schedule stored in the server must have absolute values that are not affected by time zones. Each server has a different way to store absolute values, and that is out of the scope of this article since it is all different depending on the server or database environment. However for this to work, the date and time transferred from the client to the server must be values based on the same offset (usually UTC) or values that also include the time zone data of the client environment.

It’s a common practice that this kind of data is transferred in the form of Unix time based on UTC or ISO-8601 containing the offset information. In the example above, if 11:30 AM on Mar 11, 2017 in Seoul is to be converted into Unix time, it will be an integer type of which value is 1489199400. Under ISO-8601, it will be a string type of which value is 2017–03–11T11:30:00+09:00.

If you’re working with this using JavaScript in a browser environment, you must convert the entered value as described above and then convert it back to fit the user’s time zone. The both of these two tasks have to be considered. In the sense of programming language, the former is called “parsing” and the latter “formatting”. Now let’s find out how these are handled in JavaScript.

Even when you’re working with JavaScript in a server environment using Node.js, you might have to parse the data retrieved from the client depending on the case. However since servers normally have their time zone synced to the database and the task of formatting is usually left to clients, you have fewer factors to consider than in a browser environment. In this article, my explanation will be based on the browser environment.

Date Object in JavaScript

In JavaScript, tasks involving date or time are handled using a Date object. It is a native object defined in ECMAScript, like Array or Function. which is mostly implemented in native code such as C++. Its API is well described in MDN Documents. It is greatly influenced by Java’s java.util.Date class. As a result, it inherits some undesirable traits, such as the characteristics of mutable data and month beginning with 0.

JavaScript’s Date object internally manages time data using absolute values, such as Unix time. However, constructors and methods such as parse() function, getHour()setHour(), etc. are affected by the client’s local time zone (the time zone of the OS running the browser, to be exact). Therefore, if you create a Date object directly using user input data, the data will directly reflect the client’s local time zone.

As I mentioned earlier, JavaScript does not provide any arbitrary way to change time zone. Therefore, I will assume a situation here where the time zone setting of the browser can be directly used.

Creating Date Object with User Input

Let’s go back to the first example. Assume that a user entered 11:30 AM, Mar 11, 2017 in a device which follows the time zone of Seoul. This data is stored in 5 integers of 2017, 2, 11, 11, and 30 — each representing the year, month, day, hour, and minute, respectively. (Since the month begins with 0, the value must be 3–1=2.) With a constructor, you can easily create a Date object using the numeric values.

const d1 = new Date(2017, 2, 11, 11, 30);
d1.toString(); // Sat Mar 11 2017 11:30:00 GMT+0900 (KST)

If you look at the value returned by d1.toString(), then you will know that the created object’s absolute value is 11:30 AM, Mar 11, 2017 based on the offset +09:00 (KST).

You can also use the constructor together with string data. If you use a string value to the Date object, it internally calls Date.parse() and calculate the proper value. This function supports the RFC2888 specifications and the ISO-8601 specifications. However, as described in the MDN’s Date.parse() Document, the return value of this method varies from browser to browser, and the format of the string type can affect the prediction of exact value. Thus, it is recommended not to use this method.

For example, a string like 2015–10–12 12:00:00 returns NaN on Safari and Internet Explorer while the same string returns the local time zone on Chrome and Firefox. In some cases, it returns the value based on the UTC standard.

Creating Date Object Using Server Data

Let’s now assume that you are going to receive data from the server. If the data is of the numerical Unix time value, you can simply use the constructor to create a Date object. Although I skipped the explanation earlier, when a Date constructor receives a single value as the only parameter, it is recognized as a Unix time value in millisecond. (Caution: JavaScript handles Unix time in milliseconds. This means that the second value must be multiplied by 1,000.) If you see the example below, the resultant value is the same as that of the previous example.

const d1 = new Date(1489199400000);
d1.toString(); // Sat Mar 11 2017 11:30:00 GMT+0900 (KST)

Then what if a string type such as ISO-8601 is used instead of the Unix time? As I explained in the previous paragraph, the Date.parse() method is unreliable and better not be used. However since ECMAScript 5 or later versions specify the support of ISO-8601, you can use strings in the format specified by ISO-8601 for the Date constructor on Internet Explorer 9.0 or higher that supports ECMAScript 5 if carefully used.
If you’re using a browser of not the latest version, make sure to keep the Z letter at the end. Without it, your old browser sometimes interprets it based on your local time instead of UTC. Below is an example of running it on Internet Explorer 10.

const d1 = new Date('2017-03-11T11:30:00');
const d2 = new Date('2017-03-11T11:30:00Z');
d1.toString(); // "Sat Mar 11 11:30:00 UTC+0900 2017"
d2.toString(); // "Sat Mar 11 20:30:00 UTC+0900 2017"

According to the specifications, the resultant values of both cases should be the same. However, as you can see, the resultant values are different as d1.toString() and d2.toString(). On the latest browser, these two values will be the same. To prevent this kind of version problem, you should always add Z at the end of a string if there is no time zone data.

Creating Data to be Transferred to Server

Now use the Date object created earlier, and you can freely add or subtract time based on local time zones. But don’t forget to convert your data back to the previous format at the end of the processing before transferring it back to the server.

If it’s Unix time, you can simply use the getTime() method to perform this. (Note the use of millisecond.)

const d1 = new Date(2017, 2, 11, 11, 30);
d1.getTime(); // 1489199400000

What about strings of the ISO-8601 format? As explained earlier, Internet Explorer 9.0 or higher that supports ECMAScript 5 or higher supports the ISO-8601 format. You can create strings of the ISO-8601 format using the toISOString() or toJSON() method. (toJSON() can be used for recursive calls with JSON.stringify() or others.) The two methods yield the same results, except for the case in which it handles invalid data.

const d1 = new Date(2017, 2, 11, 11, 30);
d1.toISOString(); // "2017-03-11T02:30:00.000Z"
d1.toJSON(); // "2017-03-11T02:30:00.000Z"

const d2 = new Date('Hello');
d2.toISOString(); // Error: Invalid Date
d2.toJSON(); // null

You can also use the toGMTString() or toUTCString() method to create strings in UTC. As they return a string that satisfies the RFC-1123 standard, you can leverage this as needed.

Date objects include toString()toLocaleString(), and their extension methods. However, since these are mainly used to return a string based on local time zone, and they return varying values depending on your browser and OS used, they are not really useful.

Changing Local Time Zone

You can see now that JavaScript provides a bit of support for time zone. What if you want to change the local time zone setting within your application without following the time zone setting of your OS? Or what if you need to display a variety of time zones at the same time in a single application? Like I said several times, JavaScript does not allow manual change of local time zone. The only solution to this is adding or removing the value of the offset from the date provided that you already know the value of the time zone’s offset. Don’t get frustrated yet though. Let’s see if there is any solution to circumvent this.

Let’s continue with the earlier example, assuming that the browser’s time zone is set to Seoul. The user enters 11:30 AM, Mar 11, 2017 based on the Seoul time and wants to see it in New York’s local time. The server transfers the Unix time data in milliseconds and notifies that New York’s offset value is -05:00. Then you can convert the data if you only know the offset of the local time zone.

In this scenario, you can use the getTimeZoneOffset() method. This method is the only API in JavaScript that can be used to get the local time zone information. It returns the offset value of the current time zone in minutes.

const seoul = new Date(1489199400000);
seoul.getTimeZoneOffset(); // -540

The return value of -540 means that the time zone is 540 minutes ahead of the target. Be warned that the minus sign in front of the value is opposite to Seoul’s plus sign (+09:00). I don’t know why, but this is how it is displayed. If we calculate the offset of New York using this method, we will get 60 * 5 = 300. Convert the difference of 840 into milliseconds and create a new Date object. Then you can use that object’s getXX methods to convert the value into a format of your choice. Let’s create a simple formatter function to compare the results.

function formatDate(date) {
return date.getFullYear() + '/' +
(date.getMonth() + 1) + '/' +
date.getDate() + ' ' +
date.getHours() + ':' +
date.getMinutes();
}

const seoul = new Date(1489199400000);
const ny = new Date(1489199400000 - (840 * 60 * 1000));

formatDate(seoul); // 2017/3/11 11:30
formatDate(ny); // 2017/3/10 21:30

formatDate() shows the correct date and time according to the time zone difference between Seoul and New York. It looks like we found a simple solution. Then can we convert it to the local time zone if we know the region’s offset? Unfortunately, the answer is “No.” Remember what I said earlier? That time zone data is a kind of database containing the history of all offset changes? To get the correct time zone value, you must know the value of the offset at the time of the date (not of the current date).

Problem of Converting Local Time Zone

If you keep working with the example above a little more, you will soon face with a problem. The user wants to check the time in New York local time and then change the date from 10th to 15th. If you use the setDate() method of Date object, you can change the date while leaving other values unchanged.

ny.setDate(15);
formatDate(ny); // 2017/3/15 21:30

It looks simple enough, but there is a hidden trap here. What would you do if you have to transfer this data back to the server? Since the data has been changed, you can’t use methods such as getTime() or getISOString(). Therefore, you must revert the conversion before sending it back to the server.

const time = ny.getTime() + (840 * 60 * 1000);  // 1489631400000

Some of you may wonder why I added using the converted data when I have to convert it back anyway before returning. It looks like I can just process it without conversion and temporarily create a converted Date object only when I’m formatting. However, it is not what it seems. If you change the date of a Date object based on Seoul time from 11th to 15th, 4 days are added (24 * 4 * 60 * 60 * 1000). However, in New York local time, as the date has been changed from 10th to 15th, resultantly 5 days have been added (24* 5 * 60 * 60 * 1000). This means that you must calculate dates based on the local offset to get the precise result.

The problem doesn’t stop here. There is another problem waiting where you won’t get wanted value by simply adding or subtracting offsets. Since Mar 12 is the starting date of DST in New York’s local time, the offset of Mar 15, 2017 should be -04:00 not -05:00. So when you revert the conversion, you should add 780 minutes, which is 60 minutes less than before.

const time = ny.getTime() + (780 * 60 * 1000);  // 1489627800000

On the contrary, if the user’s local time zone is New York and wants to know the time in Seoul, DST is applied unnecessarily, causing another problem.

Simply put, you can’t use the obtained offset alone to perform the precise operations based on the time zone of your choice. If you recollect what we have discussed in the earlier part of this document, you would easily know that there is still a hole in this conversion if you know the summer time rules. To get the exact value, you need a database that contains the entire history of offset changes, such as IANA timezone Database.

To solve this problem, one must store the entire time zone database and whenever date or time data is retrieved from the Date object, find the date and the corresponding offset, and then convert the value using the process above. In theory, this is possible. But in reality, this takes too much effort and testing the converted data’s integrity will also be tough. But don’t get disappointed yet. Until now, we discussed some problems of JavaScript and how to solve them. Now we’re ready to use a well built library.

Moment Timezone

Moment is a well established JavaScript library that is almost the standard for processing date. Providing a variety of date and formatting APIs, it is recognized by so many users recently as stable and reliable. And there is Moment Timezone, an extension module, that solves all the problems discussed above. This extension module contains the data of IANA Time Zone Database to accurately calculate offsets, and provides a variety of APIs that can be used to change and format time zone.

In this article, I won’t discuss how to use library or the structure of library in details. I will just show you how simple it is to solve the problems I’ve discussed earlier. If anyone is interested, see Moment Timezone’s Document.

Let’s solve the problem shown in the picture by using Moment Timezone.

const seoul = moment(1489199400000).tz('Asia/Seoul');
const ny = moment(1489199400000).tz('America/New_York');

seoul.format(); // 2017-03-11T11:30:00+09:00
ny.format(); // 2017-03-10T21:30:00-05:00

seoul.date(15).format(); // 2017-03-15T11:30:00+09:00
ny.date(15).format(); // 2017-03-15T21:30:00-04:00

If you see the result, the offset of seoul stays the same while the offset of ny has been changed from -05:00 to -04:00. And if you use the format() function, you can get a string in the ISO-8601 format that accurately applied the offset. You will see how simple it is compared to what I explained earlier.

Conclusion

So far, we’ve discussed the time zone APIs supported by JavaScript and their issues. If you don’t need to manually change your local time zone, you can implement the necessary features even with basic APIs provided that you’re using Internet Explorer 9 or higher. However, if you need to manually change the local time zone, things get very complicated. In a region where there is no summer time and time zone policy hardly changes, you can partially implement it using getTimezoneOffset() to convert the data. But if you want full time zone support, do not implement it from scratch. Rather use a library like Moment Timezone.

I tried to implement time zone myself, but I failed, which is not so surprising. The conclusion here after multiple failures is that it is better to “use a library.” When I first began writing this article, I didn’t know what conclusion I was going to write about, but here we go. As a conclusion, I would say that it’s not a recommended approach to blindly use external libraries without knowing what features they support in JavaScript and what kind of issues they have. As always, it’s important to choose the right tool for your own situation. I hope this article helped you in determining the right decision of your own.

References

Better Python dependency while packaging your project

I have been cooking this blog topic idea for a long time. I did a lot of searching, reading and trying while working on different projects. But even today after publishing it I don’t think I’m 100% satisfied with the provided solution how to manage python project dependencies efficiently.

What is package and dependency management?

Software released in bundled packages this way it’s easier to manage installed programs.

The package manager is a collection of libraries that are packaged together, which makes it easier to download entire package rather than each library.

Almost every library in the package has a dependency managed by the dependency manager.

Dependency management helps manage all the libraries required to make an application work. It’s incredibly beneficial when you’re dealing with complex projects and in a multi-environment. Dependency management also helps to keep track, update libraries faster and easier, as well as solve the problem then one package will depend on another package.

Every programming language has its flavor of dependency manager.

To summarize all above :

  • The library is a collection of already pre-written code.
  • The package is a collection of libraries that are built on each other or using each other one way or another.

Typical way of managing project dependency today

Today the most used Python package manager is pip, used to install and manage python software packages, found in the Python Package Index. Pip helps us, python developers, effortlessly “manually” control installation and lifecycle of publicly available Python packages from their online repositories.

Pip also can upgrade, show, uninstall project dependencies, etc.

To install the package, you can just run pip install <somepackage> that will build an extra Python library in your home directory.

Running pip freeze,can help to check installed packages and packages versions listed in case-insensitive sorted order.

Project setup

After building your application, you will need to perform some set of actions(steps) to make application dependencies available in the different environments.

Actions will be similar to the one below:

  • Create a virtual environment $ python3 -m venv /path/to/new/virtual/env
  • Install packages using $pip install <package> command
  • Save all the packages in the file with $pip freeze > requirements.txt. Keep in mind that in this case, requirements.txt file will list all packages that have been installed in virtual environment, regardless of where they came from
  • Pin all the package versions. You should be pinning your dependencies, meaning every package should have a fixed version.
  • Add requirements.txt to the root directory of the project. Done.

Install project dependencies

When if you’re going to share the project with the rest of the world you will need to install dependencies by running $pip install -r requirements.txt

To find more information about individual packages from the requiements.txt you can use $pip show <packagename>. But how informative the output is?

Example pip show command

How can project dependencies be easily maintained?

Personally, I think above setup is not easy to maintain, for the variety of reasons:

  1. Sometime requirements.txt files contain more than thousands of lines. When maintain and update package version is hard, it will be even more hard to automate it (for example: delete development dependencies, etc.).
  2. If versions are not pinned in requirements.txt the result executing a fresh $ pip install -r requirements.txt will result in different packages being installed every time then new, different versions of sub-dependencies will be released.
  3. Pip doesn’t have dependency resolution.
  4. Sometimes you may want to create requirements.txt as an empty file without modules (this will not work with pip freeze command. )

Are there any better alternatives?

Option 1 : multiple requirements.txt files ?

There are many examples of projects with multiple requirements.txt files. Developers having different versions of requirements.txt file for example for different environments (e.g., test or local ) or files for different users (e.g., machines vs. people).

Multiple requirements.txt is a good solution for managing project dependencies? I disagree…managing manually various requirements.txt files not a good solution, and it will be not easy if they grow more than even ~50lines.

Option 2: can Pipreqs and Pipdeptre make it better?

I recently tried pipreqsutility, which generates requirements.txt file based on project imports. It’s simple to use.

To generate a requirements.txt file you can run pipreqs /your_project/path

Example pipreqs command

Pipdeptree

I thought of combining it with pipdeptree another cool and “handy” command line utility which will help to display the installed python packages in the form of a dependency tree.

After executing pipdeptree command in your terminal window in the virtualenv directory of the project, all the installed python packages of a dependency tree will be displayed:

Example pipdeptree command

Cool bonus thatpipdeptree will warns you when you have multiple dependencies where versions don’t exactly match.

I found it’s handy in some cases, like:

  • if you want to create a requirements.txt for a git repository and you only want to list the packages required to run that script; packages that the script imports without any “extras”
  • support option like clean command
  • can be used with pipdeptree to verify project dependencies

There are some downsides too,Pipreq will not include the plugins required for specific projects, and you will end up adding plugins information manually in a requirement.txt. It’s not yet a very mature utility.

Option 3: have you tried pip-compile?

pip-compilemodule provides two commands: pip-compile and pip-sync.

pip-compile command will generate requirements.inor requirements.txt of top-level summary of the packages and all (underlying dependencies) pinned. And you can store .in and .txt files in version control.How useful, right?

This means that we can get the same versions whenever we run pip install, no matter what the new version is.

-generate-hashes flag helps to generate-hashes. In this case pip-compile consults the PyPI index for each top level package required, looking up the package versions available.

To update all packages, periodically re-run pip-compile --upgrade.

To update a specific package to the latest or a particular version use the --upgrade-package or -P flag.

pip-sync command used to update your virtual environment to reflect exactly what’s in there. Command will install/upgrade/uninstall everything necessary to match the requirements.txt contents.

Software dependencies are often the largest attack surface

To know what dependencies your project hasT is very useful to find out after the fact what packages were installed and what dependencies your project has.

Organizations usually assume most risks come from public-facing web applications. That has changed. With dozens of small components in every application, threats can come from anywhere in the codebase.

Currently using pip freeze will only show the final package list of dependencies.

I would recommend using pipdeptreemodule which helps to find possible dependency conflicts and displaying an actual dependency in the project.

pipdeptree --reverse leaf package installed first

Another good advice is start building applications using Docker. Every tool we run in Docker is one less tool we have to install locally, so get up-and-running phase will be much faster. But that is a different topic.

Happy Packaging .

Interesting reading:

Setting up Environment Variables in MacOS Sierra

An environment variable in a named object containing data which can be used by multiple applications or processes. Basically, it is just a variable with a name and an associated value. It can be used to determine anything like location of executable files, libraries, current working directory, default shell, or local system settings.

For those new to mac can get overwhelmed with how to set up and manage these environment variables. This guide provides easy ways to do so.

Displaying current Environment Variables

This is very easy. Just open the Terminal and run the command printenv as shown below.

HIMANSHUs-MacBook-Pro:~ himanshu$ printenvJAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home
TERM_PROGRAM=Apple_Terminal
SHELL=/bin/bash
...

This will list all the environment variables currently set.

However, for displaying the value of any specific environment variable run the echo $[variable name] on the terminal, as shown below.

HIMANSHUs-MacBook-Pro:~ himanshu$ echo $JAVA_HOME/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home

Setting temporary environment variable using terminal

If the environment variable you wish to set is to be used once or twice, you would like to set a temporary variable for it, avoiding unwanted variables staying in the system. You can do this simply by opening the terminal and running export command followed by the variable name and its value.

HIMANSHUs-MacBook-Pro:~ himanshu$ export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home

The above example sets the variable $JAVA_HOME to the specified value. However, if your requirement is to append a value to an existing environment variable, then assign the value as

export [existing_variable_name]=[new_value]:$[existing_variable_name]

the ‘:’ here append the value to the existing value. See example below.

HIMANSHUs-MacBook-Pro:~ himanshu$ export PATH=/Users/himanshu/Documents/apache-maven-3.5.0/bin:$PATH

Setting permanent environment variable using terminal

Since Mac uses bash shell, so the environment variables can be added to the .bash_profile directory, for the current user. The path to this file can be found using the command

HIMANSHUs-MacBook-Pro:~ himanshu$ ~/.bash_profile

Get started by opening this file using a text editor. I’m using nano — a terminal based text editor, you may use any text editor you like — to open the file and edit it.

HIMANSHUs-MacBook-Pro:~ himanshu$ nano .bash_profile

This will open the .bash_profile file in the terminal.

Note: If there is no file named .bash_profile, then this above nano command will create a new file named .bash_profile .

Now move to the end of the file, go to the next line. Now add the desired environment variables using export command as we did before.

Press ctrl+X to exit the editor. Press ‘Y’ for saving the buffer, and you will return back to the terminal screen.

We are done now!

You may again run echo $[variable_name] to see the value of your just saved environment variable.

UPDATE: Don’t forget to close and reopen the terminal before using your newly set environment variable. Reopening the terminal loads the updated .bash_profile file.

Library not loaded: libcrypto.1.0.0.dylib issue in mac

You might have come across this error while dealing with the openssl module.

Inorder to solve this issue follow the following steps

Step 1: Install openssl using brew

brew install openssl

Step 2: Copy copy libssl.1.0.0.dylib and libcrypto.1.0.0.dylib

cd /usr/local/Cellar/openssl/1.0.1f/lib
sudo cp libssl.1.0.0.dylib libcrypto.1.0.0.dylib /usr/lib/

Note the bold folder name. There will be change in that depending on your openssl version

Edit (2019 July) If you are getting permission denied error even after sudo. Try copying to `/usr/local/lib ` instead. Thanks to George Hotz from comments to pointing it out.

Step 3: Remove the existing links

sudo rm libssl.dylib libcrypto.dylib
sudo ln -s libssl.1.0.0.dylib libssl.dylib
sudo ln -s libcrypto.1.0.0.dylib libcrypto.dylib

That’s it. Now try installing what you have been trying to install.

I hope this helps. If you need any further clarification, do comment.

https://mithun.co/hacks/library-not-loaded-libcrypto-1-0-0-dylib-issue-in-mac/
r_and_o_cert_ca

How to upgrade OpenSSL (macOS)

Problem : OpenSSL Security Advisory [3rd May 2016] High severity
Solution : Update it 🙂

Mac OSX 10.11.4

Check version

$ openssl version -a

Backup old version

$ sudo mv /usr/bin/openssl /usr/bin/openssl-old

For 10.12.2 will get…(and maybe this should help)
mv: rename /usr/bin/openssl to /usr/bin/openssl-old: Operation not permitted

Or remove old version (skip this if you already backup)

$ sudo rm /usr/bin/openssl

Install Homebrew if you didn’t have

$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Or update if you already have

$ brew update && brew upgrade

Install OpenSSL with Homebrew

$ brew install openssl

Symbolic link

$ brew link --force openssl

[UPDATE] 2016/12/11

OpenSSL 1.0.2j, Homebrew 1.1.2, Mac 10.11.6

You’ll see…

Warning: Refusing to link: openssl
Linking keg-only openssl means you may end up linking against the insecure,
deprecated system OpenSSL while using the headers from Homebrew’s openssl.
Instead, pass the full include/library paths to your compiler e.g.:
-I/usr/local/opt/openssl/include -L/usr/local/opt/openssl/lib

And yes we’re doom! But no worry we can manually link it with steps below.

1. Ensure it exist

$ ls -l /usr/local/opt/openssl

You should see (after $ brew install openssl)

lrwxr-xr-x 1 katopz admin 24 Sep 29 00:21 /usr/local/opt/openssl -> ../Cellar/openssl/1.0.2j

2. Link it

$ sudo ln -s /usr/local/Cellar/openssl/1.0.2j/bin/openssl /usr/bin/openssl

For 10.12.2 you will get…(and maybe this should help)
ln: /usr/bin/openssl: Operation not permitted

3. And maybe you’ll need this too

$ mkdir -p /usr/local/lib
$ ln -s /usr/local/opt/openssl/lib/libcrypto.1.0.0.dylib /usr/local/lib/
$ ln -s /usr/local/opt/openssl/lib/libssl.1.0.0.dylib /usr/local/lib/

Close Terminal and reopen then check version

$ openssl version -a

You should see…

OpenSSL 1.0.2j  26 Sep 2016built on: reproducible build, date unspecifiedplatform: darwin64-x86_64-ccoptions:  bn(64,64) rc4(ptr,int) des(idx,cisc,16,int) idea(int) blowfish(idx)compiler: clang -I. -I.. -I../include  -fPIC -fno-common -DOPENSSL_PIC -DZLIB_SHARED -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -arch x86_64 -O3 -DL_ENDIAN -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASMOPENSSLDIR: "/usr/local/etc/openssl"

Nice! We’re safe now until another incident appear tho.

And next time you can just…

$ brew update && brew upgrade

Happy OpenSSLing!

photoshop

Learn How To Quickly Create UIs in Python

Finally a library you can pick up in under 5 minutes

Costas AndreouDec 6 · 4 min read

Photo by Eftakher Alam on Unsplash

The biggest advantage of python is the ease of use and the abundance of libraries for just about anything. With a few lines of code, there is nothing you couldn’t do. As long as your python scripts are for personal use or your target audience is technical enough, you would never even have to think about a User Interface (UI).

Sometimes, however, your target audience is not technical enough. They’d love to use your python scripts but only as long as they didn’t have to look at a single line of code. In those cases, providing command line scripts will simply not cut it. You would ideally need to provide them with a UI. Although I wouldn’t be surprised if you have your typical desktop client versus web-based UI debate, in this blog posts, the aim is to use Python exclusively.

Python Libraries Available for UI usage

There are essentially 3 big Python UI libraries; Tkinter, wxPython and PyQT. While reviewing all three, I realised that everything that I liked about Python was nowhere to be found in using these libraries. Python libraries, in general, make a very good job of abstracting away the super technical. If I needed to work with Object Oriented Programming, I might as well have loaded up Java or .Net.

Much to my delight, however, I came across a fourth option that seemed to be catering to my kind of liking. The library I begun reviewing and ultimately choosing to create Python UIs from is called PySimpleGUI. Funnily enough, this library is using all the 3 popular libraries, but abstracts away the super technical.

Without any further ado, let’s dive in and explore this library by solving a real problem at the same time.

Check that two files are identical

Using my previous article 3 Quick Ways to Compare Data in Python, we can use the first section, Check the Integrity of the Data to attempt to build a UI.3 Quick Ways To Compare Data in PythonFor anyone working in an analytical role, receiving requests to compare data will be all too familiar. Whether that is…medium.com

We essentially need a way to load up two files, and then choose the encryption we would like to use to do the file comparison.

Code the UI

To build that UI, we can use the following code:

import PySimpleGUI as sg
layout = [
[sg.Text('File 1'), sg.InputText(), sg.FileBrowse(),
sg.Checkbox('MD5'), sg.Checkbox('SHA1')
],
[sg.Text('File 2'), sg.InputText(), sg.FileBrowse(),
sg.Checkbox('SHA256')
],
[sg.Output(size=(88, 20))],
[sg.Submit(), sg.Cancel()]
]
window = sg.Window('File Compare', layout)
while True: # The Event Loop
event, values = window.read()
# print(event, values) #debug
if event in (None, 'Exit', 'Cancel'):
break

which results in:

Simply Python UI, generated by the above code

Plugging in the logic

With the UI in place, it’s simple for one to see how to plug in the rest of the code. We simply need to monitor for what the user inputs and then act accordingly. We can very easily do that, with the following code.

import PySimpleGUI as sg
import re
import hashlibdef hash(fname, algo):
if algo == 'MD5':
hash = hashlib.md5()
elif algo == 'SHA1':
hash = hashlib.sha1()
elif algo == 'SHA256':
hash = hashlib.sha256()
with open(fname) as handle: #opening the file one line at a time for memory considerations
for line in handle:
hash.update(line.encode(encoding = 'utf-8'))
return(hash.hexdigest())layout = [
[sg.Text('File 1'), sg.InputText(), sg.FileBrowse(),
sg.Checkbox('MD5'), sg.Checkbox('SHA1')
],
[sg.Text('File 2'), sg.InputText(), sg.FileBrowse(),
sg.Checkbox('SHA256')
],
[sg.Output(size=(88, 20))],
[sg.Submit(), sg.Cancel()]
]window = sg.Window('File Compare', layout)while True: # The Event Loop
event, values = window.read()
# print(event, values) #debug
if event in (None, 'Exit', 'Cancel'):
break
if event == 'Submit':
file1 = file2 = isitago = None
# print(values[0],values[3])
if values[0] and values[3]:
file1 = re.findall('.+:\/.+\.+.', values[0])
file2 = re.findall('.+:\/.+\.+.', values[3])
isitago = 1
if not file1 and file1 is not None:
print('Error: File 1 path not valid.')
isitago = 0
elif not file2 and file2 is not None:
print('Error: File 2 path not valid.')
isitago = 0
elif values[1] is not True and values[2] is not True and values[4] is not True:
print('Error: Choose at least one type of Encryption Algorithm')
elif isitago == 1:
print('Info: Filepaths correctly defined.')
algos = [] #algos to compare
if values[1] == True: algos.append('MD5')
if values[2] == True: algos.append('SHA1')
if values[4] == True: algos.append('SHA256')
filepaths = [] #files
filepaths.append(values[0])
filepaths.append(values[3])
print('Info: File Comparison using:', algos)
for algo in algos:
print(algo, ':')
print(filepaths[0], ':', hash(filepaths[0], algo))
print(filepaths[1], ':', hash(filepaths[1], algo))
if hash(filepaths[0],algo) == hash(filepaths[1],algo):
print('Files match for ', algo)
else:
print('Files do NOT match for ', algo)
else:
print('Please choose 2 files.')window.close()

Running the above code will give you the following outcome:

Closing Thoughts

Although not the prettiest of UIs, this library allows you to quickly spin up simple python UIs and share them with whoever you may need to. More importantly, the code that you require to do so, is simple and very readable. You will still have the problem of having to run the code to get the UI, which may make sharing a bit difficult, but you can consider using something like PyInstaller which will turn your python script into a .exe that people can simply double click.

Costas Andreou

Let’s simplify things!

Towards Data Science

Towards Data Science

More from Towards Data Science

Is Julia Set to Take Over Python the Same Way Python Took Over JAVA?

Dev Sharma

BUILD A TEXT GENERATOR WEB APP IN UNDER 50 LINES OF PYTHON

Learn to build a web app which auto-completes any input text

Dev Sharma

Dev SharmaOct 27 · 8 min read

We will be using OpenAI’s GPT-2 as the model and Panel as the web dashboard framework. This guide will be split into two parts. In the first part, we will load our model and write a predictions function. In the second, we will build the web application.

Example text generation application. We will be building a simpler variation of this web app.

What you will need

This tutorial assumes you already have Python 3.7+ installed and have some understanding of Language Models. Although the steps involved can be done outside of Jupyter, using a jupyter notebook is highly highly recommended.

We will be using PyTorch as our Deep Learning library of choice. Within PyTorch, we will use the transformers library to import the pre-trained OpenGPT-2 model. You can install these libraries by individually entering the following commands in your bash:

pip install torchpip install transformers

For our web application, we will be utilizing Panel, a great tool for easily creating servable dashboards from either jupyter notebooks or a regular python script. Use the following command to install panel:

pip install panel

Part 1: Setting up the Model

OpenAI’s GPT is a type of transformer model which has received a lot of buzz about its capabilities to produce human-like text. If you have not experimented with it before, you are likely to come away with the same opinion at the end of this read.

Loading the Model

First, we need to import the required packages.

import numpy as np
import torch
import torch.nn.functional as F
from transformers import GPT2Tokenizer, GPT2LMHeadModel
from random import choice

Next, we will load the OpenGPT2 Tokenizer and the Language Model: (it may take a few minutes to download the pre-trained model if run for the first time)

tok = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")

Predictions Function

At this stage, most of the work in already done. Since our model is pre-trained, we don’t need to train it or make any modifications. We simply need to write a function which can input text to the model and generate a prediction.

def get_pred(text, model, tok, p=0.7):
input_ids = torch.tensor(tok.encode(text)).unsqueeze(0)
logits = model(input_ids)[0][:, -1]
probs = F.softmax(logits, dim=-1).squeeze()
idxs = torch.argsort(probs, descending=True)
res, cumsum = [], 0.
for idx in idxs:
res.append(idx)
cumsum += probs[idx]
if cumsum > p:
pred_idx = idxs.new_tensor([choice(res)])
break
pred = tok.convert_ids_to_tokens(int(pred_idx))
return tok.convert_tokens_to_string(pred)

There is a lot happening in this function. So, let’s break it down. First, we are tokenizing and encoding the input text from input_ids. Then, we ask our model to generate a logits vector for the next word/token. After applying softmax and sorting these probabilities in descending order, we have a vector, idxs, which lists the indices of each token in our vocab in order by their respective probabilities.

At this stage, we could just pick the token which has the highest probability. However, we want to be able to mix up our results so the same input text can generate a variety of text. To do this, we will add an element of randomness where we choose a random token from a list of the most probable next tokens. This way, we are not selecting the same predicted token each time. To do this, we utilize Nucleus (Top-p) Sampling.

We perform this by looping through each probability until the sum of all the probabilities we have looped over is greater than p, an arbitrary number between 0 and 1. All the tokens iterated through until p is exceeded are stored in a list, res. Once, is exceeded, we choose a token at random from this list. Remember that the list of probabilities that we are looping through contains indices ordered by their probability. Note that if p is higher, more tokens will be included in our list. Vice versa. Therefore, if you want the same result each time, you can set p to 0.

Now, let’s test out our pred function a few times:

Each time, there is a different result which is exactly what we expect. Our prediction function is now ready. Let’s build our web app!

Part 2: Building the Web Application

Panel Overview

If you are not familiar with Panel, it facilitates the process of creating web dashboards and apps. At a first glance, what you need to know is that it has three primary components:

  • Panels: containers which can contain one or more of panes (objects) such as text, image, graphs, widgets etc. (they can contain other panels as well)
  • Panes: any single object such as text, image, dataframe, etc.
  • Widgets: user adjustable items such as text input, sliders, buttons, checkboxes which can alter the behavior of panes

The next and final thing you need to know for our purpose is that there are multiple ways for us to define how different panes and widgets interact with each other. These are called “callbacks.” For example, if a certain button is pressed, how should the other panes be updated? We will be defining a callback function later on which does exactly this.

High Level Application Overview

Our text generator app will have an input for a user to enter their desired text. Next, the user should be able to generate a new token with a press of a button. After which, new text will be generated with a predicted token from the function we defined in Part 1. Lastly, the user should be able to continue to generate new text on top of the already predicted tokens.

Implementation

Let’s first import panel and create the text input widget:

import panel as pn
pn.extension() # loading panel's extension for jupyter compatibility text_input = pn.widgets.TextInput()

Now, if we execute text_input in jupyter, we get the following:

Next, we want a pane which will store the whole text as we generate more and more tokens:

generated_text = pn.pane.Markdown(object=text_input.value)

Notice that we set the object of text to the value of text_input. We want the value of the generated_text to have the same value as the text_input since we will be predicting new text on top of the generated_text. As more tokens get added to our sequence, we will keep predicting over the generated_text until the user changes the text_input. In which case, the process will restart.

However, we are not quite done yet. Although generated_text will take the value of text_input at its initiation, it will not update itself if the text_input value changes. For this, we need to link these two objects together as so:

text_input.link(generated_text, value='object')

Here, we have formed a unidirectional link between text_input to generated_text. So whenever the value of the text_input changes, the value of generated_text is changed to the new value as well. See:

observing linked behavior between text_input and generated_text in a panel. Note: pn.Row as a component is a panel i.e. container of panes and widgets

Now that we have both our text objects, let’s create our button widget:

button = pn.widgets.Button(name="Generate",button_type="primary")

Great, now that we have a button, we just have to link it to our desired behavior. For this we will be writing a callback function which will run every time the button is clicked:

def click_cb(event):
pred = get_pred(generated_text.object, model, tok)
generated_text.object += pred

Two things happen here. First, we pass generated_text as the input to the prediction function we wrote earlier which gives a new token. Second, this token is added to the generated_text. This process repeats each time there is a new click of the button.

Speaking of, we still have to tie the button click with the callback function. We can do that with:

button.on_click(click_cb)

We are now through creating all our widgets, panes and functions. We just need to put these objects in a panel and voila:

app = pn.Column(text_input, button, generated_text); app
Note: pn.Column, similar to pn.Row is another type of panel i.e. container of widgets, panes and even other panels.

Let’s add a title and a brief description and we are through!

title = pn.pane.Markdown("# **Text Generator**")
desc = pn.pane.HTML("<marquee scrollamount='10'><b>Welcome to the text generator! In order to get started, simply enter some starting input text below, click generate a few times and watch it go!</b></marquee>")final_app = pn.Column(title, desc ,app)

Serve the Application

Panel makes it very easy to serve the app. There are two methods which can be used to do this. The first one is the “.show()” command. This is used for debugging usually and it is used as below. This will launch a new window with our final_app panel running as a web application.

final_app.show()

In order to put it in a production environment, you need to use the “.servable()” method. However, if you run this similarly to the show method, nothing different will happen in your current notebook. Instead, you have to serve the notebook through your machine’s bash like this:

panel serve --show text_generation_app.ipynb

This will launch your app on a local port as long as you have the following code in your notebook:

final_app.servable()

Done.


By now, you have the capabilities to build your own text generation app. You can further build upon it by adding more panel components. You can even embed this app in your other projects. As always, you can find my code base on github. Note: the app in the title image is the advanced variation found in my tutorial notebook: text_generation_app.ipynb.

devkosal/gpt-panel-appYou can’t perform that action at this time. You signed in with another tab or window. You signed out in another tab or…github.com

Additional Resources

Dev Sharma

WRITTEN BY

Dev Sharma

MSc Analytics @ Columbia

OPEN MY HOME KUBERNETES CLUSTER TO INTERNET AND SECURE IT WITH LET’S ENCRYPT TLS CERTIFICATE

After struggling for a few weeks, finally, on my mobile, I could launch my page, that running on my home Kubernetes cluster and hosting on my public domain, with the Chrome browser. I even don’t have to tolerate that dazzling “not secure” icon and that little red text remind me that my site is not trusted, because I protected with a TLS certificate issued by Let’s Encrypt. The whole setup was free, besides the monthly bill from my ISP and the cost for turning on my 10-years old PC, and I will tell you how to do it.

Open your home Kubernetes cluster to the internet could be significant. Imagine that you are a freelancer and want to run a demo site for your client for a few days. It can be hosted on your PC, at least they won’t complain that your page has a bug because they saw a little red warning text next to the address bar. It also could be your last frontier in freezone before you move them to cloud, until now I still hide my credit card no from Google, AWS and Azure.

Anyway, if you plan to do what I done, you will need to run you own Kubernetes cluster locally. You can find my previous post for how to configure your home Kubernetes cluster with the Rancher server.

The header pic illustrated my home network setup, and how the incoming requests from internet forward into my Kubernetes cluster, you can jump to next section about the TLS certificate setup if you found that pic was instructive enough.

  • Like an ordinary home network, I has a wireless router connecting to my ISP, behind it is my first-tier private LAN using network address 192.168.1.0/24. My wireless router is also a DHCP server, that assigned an IP address 192.168.1.128/24 to my desktop PC.
  • My desktop PC running on Windows 10, it installed with VMWare as the hypervisor. The NAT network managed by the VMWare is my second-tier private LAN using different network address 192.168.24.0/24. The Ubuntu virtual machines were spun up in the second-tier network formed my local Kubernetes cluster, one of the worker nodes was assigned with an IP address 192.168.24.149/24.
  • To open my Kubernetes cluster to the internet (explicitly is to open the Nginx ingress controller running on worker nodes), I configured the port forwarding rules on both wireless router and VMWare hypvisor, it allows incoming requests from internet forwarding to the Nginx ingress controller.
  • Another critical setting to make the requests through is to add inbound rule into my Windows 10 firewall. The default rule set blocked incoming requests to both Http (80) and Https (443) ports, therefore an allowed-rule is necessary for establishing the connection.
  • Meanwhile, I registered for a free public domain “hung-from-konghong.asuscomm.com” with the DDNS service come with my ASUS wireless router. I believe those well-known DDNS providers such as Google domain, DynDNS or no-IP are supported by most of the wireless routers in the market.
  • Finally, to verify the above settings, I tested the Nginx ingress controller by making request from internet. I tested with my mobile, even no ingress rules had been defined, Nginx can return a 404 page. I also used the canyouseeme.org, the utility page can capture my public IP address and checked whether my both Http and Https ports were opened or not.

Let’s Encrypt, DNS-01 and HTTP-01 challenge

Congratulations! If you followed to this point, your Kubernetes cluster should also be accessible from internet too. Now I had my public domain, I can request an TLS certificate for it from Let’s Encrypt.

  • Let’s Encrypt is a CA (Certificate Authority) who offers a free TLS certificate, it verifies certificate and delivers certificate using the ACME protocol.
  • First, it requires to deploy an agent on my Kubernetes cluster. The agent responds to raise certificate request to the Let’s Encrypt service, completes either the DNS-01 or the HTTP-01 challenge, and installs the certificate delivered by CA. The challenge is part of the ACME protocol, it lets the CA validates whether the public domain in the cert request is managed by the requester.
  • With the DNS-01 challenge, the agent will be asked to update the text (TXT) record (a type of DNS record) of their domain. Since I relied on ASUS’s DDNS service to register my public domain, and it does not provide the feature to update my text record, therefore I could only take the HTTP-01 challenge option.
  • With the HTTP-01 challenge, the agent has to publish a given token into a pre-agreed URL, after Let’s Encrypt servers verified that content, it will deliver a new SSL certificate to the agent.

Cert-Manager and Helm

I found Cert-Manager as the ACME agent implmentation for Kubernetes environment, if you search both “Kubernetes” and “Let’s Encrypt” in Google, it should be listed within top 10. The tool integrates with Nginx ingress controller to do the HTTP-01 challenge automatically.

Install Helm and Tiller

  • Cert-Manager is available in Helm chart package, so I has to install Helm first. Helm is a packaging system for Kubernetes resources.
  • Helm comes with a backend service, the Tiller which to deploy different Kubernetes resources in a Helm chart package. To run Tiller on a Kubernetes cluster which has Role Base Access Control (RBAC) enabled (cluster created by Rancher is RBAC enabled by default). Tiller needs to run with a service account granted with the cluster-admin role. I captured the script to install Helm as below:
# Install Helm with snap
sudo snap install helm --classic# Create a service account for triller with following manifest
cat <<EOF | kubectl apply -f -
apiVersion
: v1
kind: ServiceAccount
metadata:
name: tiller
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: tiller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: tiller
namespace: kube-system
EOF# Install Tiller - the backend service for Helm
helm init --service-account tiller# Verify Helm client and Tiller server installation
helm version

Install Cert-Manager

  • Cert-Manager’s document recommands to install it into a separated namespace and I captured only thenecessary steps to install Cert-Manager.
# Install the CustomResourceDefinition resources separately
kubectl apply --validate=false -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.11/deploy/manifests/00-crds.yaml

# Create the namespace for cert-manager
kubectl create namespace cert-manager

# Add the Jetstack Helm repository
helm repo add jetstack https://charts.jetstack.io

# Update your local Helm chart repository cache
helm repo update

# Install the cert-manager Helm chart
helm install \
--name cert-manager \
--namespace cert-manager \
--version v0.11.0 \
jetstack/cert-manager# Verify the cert-manager installation
kubectl get pods --namespace cert-manager

Create Issuer for Let’s Encrypt production service

  • Now I came to the ACME agent part, Issuer and Cluster Issuer are types of Kubernete resource comes with Cert-Manager, Issuer can only work with resources in its namespace, and Cluster Issuer do not has such restiction.
  • An issuer responses to deal with differnt types of CA and issuing TLS certificate for ingress rules. Following manifest defined a Cluster Issuer that works as agent for Let’s Encrypt production service, the spec.acme.solvers property defined to use HTTP-01 challenge for verification and integrated for Nginx ingress controller.
  • Other than production service, Let’s Encrypt also provides the staging service, to switch to it, you just need to change the spec.acme.server property to a proper URL.
# Create the cluster issuer with following manifest
cat <<EOF | kubectl apply -f -
apiVersion
: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
# The URL for Let's Encrypt production service
server: https://acme-v02.api.letsencrypt.org/directory
# My Email address used for ACME registration
email: kwonghung.yip@gmail.com
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-prod
# Enable the HTTP-01 challenge provider
solvers:
- http01:
ingress:
class: nginx
EOF# Verify the resource
kubectl describe clusterissuer letsencrypt-prod

Request a TLS certificate and save it into Secert

  • Next step is to request a TLS certificate. The Certificate resource introduced by Cert-Manager actually is for making certificate request (a little bit confuse, Ha!), the received TLS certificate eventally is stored as a Kubernetes Secret object.
  • That is what you can find in Kubernete offical reference, the spec.tls.secretName property for Ingress rule defines which Secret contains the TLS key pair, it means you can apply TLS certificate without using the Cert-Manager, but it does give a convenience way to handling the certificate.
  • Following manifest defined a Certificate Resource that refer to the Cluster Issuer created before, the TLS certificate was stored into Secret named tls-public-domain.
#Create certificate resource to request certifiate from Cluster Issuer
cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
name: tls-public-domain
namespace: default
spec:
dnsNames:
- hung-from-hongkong.asuscomm.com
issuerRef:
group: cert-manager.io
kind: ClusterIssuer
name: letsencrypt-prod
secretName: tls-public-domain
EOF

Deploy the Tomcat service for testing

  • After the TLS certificate Secret has been created, I deployed a Tomcat service for verification, a sample service was necessary because it needed a Ingress rule that get used the TLS certificate Secret. I used Tomcat because I am a Java developer and it does provide a default welcome page for verification.
  • I packed the Tomcat service as a Helm chart package and hosting it on GitHub Page, you can refer to my other post for details. Following script show how to deploy the Tomcat with Helm, and the Ingress rule came with the package.
# Add my Helm repository running on GitHub Page
helm repo add hung-repo https://kwonghung-yip.github.io/helm-charts-repo/# Update local Helm charts repository cac
helm repo update# Install the tomcat service
helm install hung-repo/tomcat-prod --name tomcat# Verify the ingress rule manifest after installed the tomcat, sample output as below:
helm get manifest tomcat
...
...
---
# Source: tomcat-prod/templates/ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: tomcat-tomcat-prod
labels:
app.kubernetes.io/name: tomcat-prod
helm.sh/chart: tomcat-prod-0.1.0
app.kubernetes.io/instance: tomcat
app.kubernetes.io/version: "9.0.27"
app.kubernetes.io/managed-by: Tiller
spec:
tls:
- hosts:
- hung-from-hongkong.asuscomm.com
secretName: tomcat-acme-prod
rules:
- host: hung-from-hongkong.asuscomm.com
http:
paths:
- backend:
serviceName: tomcat-tomcat-prod
servicePort: 8080
  • After going through all the steps, the welcome page was exposed and secured.

Conclusion and further work

In this post, I shared my findings and the steps that how I opened my home Kubernetes cluster to the Internet and secrued it with the Let’s Encrypt TLS certificate.

Other than ACME agent, Cert- Manager Issuer also supports self-signed certificate as the Certificate Authority, it allows to issue a certificate to a wildcard domain within your private LAN, with a wildcard domain, different services can have their customized domain and they all under a single self signed root certificate.

Other further works can be:

  • To bridge Github or other public repo and your home Kubernetes cluster with webhook to automate the deployment process for your home Kubernetes cluster.
  • Instead of forwarding request to only one of my worker nodes, the requests should be forward to a HA proxy that will be a load balancer of all worker nodes.

In the next post, I will look into service mesh, Istio and their implementations.

Below sections supplement the technical details for your reference, please feel free to leave comment or messaging me, my contact info can be found at the end of this post.


DDNS settings in my ASUS wireless router

Port fowarding settings in my ASUS router

Port forwarding setting for VMWare Hypervisor

Windows 10 firewall inbound rule settings

References and resources

[Wireless][WAN] How to set up Virtual Server/ Port Forwarding on ASUS Router? | Official Support |…Edit descriptionwww.asus.com[WAN] How to set up DDNS ? | Official Support | ASUS USAEdit descriptionwww.asus.comChange NAT SettingsYou can change the gateway IP address, configure port forwarding, and configure advanced networking settings for NAT…docs.vmware.comAutomatically creating Certificates for Ingress resources – cert-manager documentationEdit descriptiondocs.cert-manager.io

email: kwonghung.yip@gmail.com

linkedin: linkedin.com/in/yipkwonghung

Twitter: @YipKwongHung

github https://github.com/kwonghung-YIP

Kwong Hung Yip

WRITTEN BY

Kwong Hung Yip

Developer from Hong Kong