Ethereum Token Contract ABI in Web3.js for ERC-20 and Human Standard Tokens

This post will introduce you to Token Contract ABIs in Ethereum, and show you how you can use a the Human Standard Token ABI to access the token balances of ERC-20 compatible tokens.

 

Let me start by saying that this post may be trivial to some, but was very confusing to me, a brand new user to the Ethereum and Web3.js development world. After writing my “Hello World” project which gets the ETH balance for an Ethereum Address in Web3, I began to think about the next small step I can take and teach to others. Naturally, I thought it would make sense to do a similar project, but instead, get the Token Balance for ERC-20 tokens at an Ethereum Address.

With Web3.js, you can easily find the template for this functionality:

var tokenContract = eth.contract(tokenABI).at(tokenAddress);
var tokenBalance = tokenContract.balanceOf(ethereumAddress);

Seems easy enough to get the Token Address, but what is this tokenABI value that we need to use?

A simplified explanation of a token contract ABI

A little bit of searching will give you documents that teach you about the Application Binary Interface (ABI) like this, but no real layman’s terms explanation. Here is my attempt at one:

In Ethereum, the Application Binary Interface (ABI) is a predefined template of the functions a contract exposes.

You know when you import a new library into an IDE, you automatically get all that nice autocomplete and Intellisense? You type the library name, add “.” and a list of functions appears in front of you:

Imagine if instead, you needed to know ahead of time the functions the library exposes, and then define them for the IDE so that the autocomplete would work… that is pretty much what is happening here.

An Ethereum Contract ABI will define the different functions that a contract exposes, and each function definition will contain things like the function type, name, inputs, outputs, and more. It even contains information like whether the contract accepts payments in ether or not. Here is the JSON ABI of the balanceOf() function we ultimately want to use:

{
  "constant": true,
  "inputs": [
    {
      "name": "_owner",
      "type": "address"
    }
  ],
  "name": "balanceOf",
  "outputs": [
    {
      "name": "balance",
      "type": "uint256"
    }
  ],
  "payable": false,
  "type": "function"
}

I think this is easy enough to read: It is a function which accepts an address as an input, and outputs a balance as an unsigned int. But is this enough to start talking to an Ethereum contract? What about all the other functions that contract might have? How will I find the full contract ABI for each contract I want to talk to?

Here are the things I learned when trying to answer these questions:

  • You do NOT need the full ABI to interact with a Token Contract. You only need to define the functions which you want to use.
  • You cannot programmatically generate the ABI for a given contract using data from the Ethereum blockchain. In order to generate the full contract ABI from scratch, you will need the full contract source code, before it is compiled. Note that only the compiled code exists at a contract address.
  • Some contracts have functions which are intentionally ‘hidden’ from the public, and they do not intend the public to use.

Therefore, there really is no way to dynamically call any contract address. If only there were some standard set of functions shared across all contracts…

The ERC-20 and Human Standard Token

From the specification:

Simple Summary

A standard interface for tokens.

The value of ERC-20 tokens is that they all have a standard set of functions which allow you to interact with each of them in the exact same way. This is why there is so much hype around new Ethereum application which use ERC-20 tokens: there is nearly zero effort to add these tokens to existing platforms like Cryptocurrency Exchanges which means smoother adoption, easier to sell/track, and of course more hype ($$$).

What does it take to be ERC-20 compliant?

Not much really. You just need to expose the non-optional methods and events described here. Beyond the core ERC-20 standard, there are also standard optional parameters which are intended for humans. See here:

In other words. This is intended for deployment in something like a Token Factory or Mist wallet, and then used by humans.
Imagine coins, currencies, shares, voting weight, etc.
Machine-based, rapid creation of many tokens would not necessarily need these extra features or will be minted in other manners.

1) Initial Finite Supply (upon creation one specifies how much is minted).
2) In the absence of a token registry: Optional Decimal, Symbol & Name.
3) Optional approveAndCall() functionality to notify a contract if an approval() has occurred.

This is the “Human Standard Token”, which of course is a super-set of the ERC-20 standard. Additionally, many tokens have a version() function which is also available in the ABI provided below.

So let’s get it working!

Now that you have sufficient background to understand what is going on, lets actually go and make some calls to ERC-20 token contracts.

To start, you can find the Human Standard Token ABI here on GitHub. The JS file simply puts the ABI JSON object into a varaible called human_standard_token_abi which allows you to really easily use it in a project.

The most basic project that can take advantage of these things would look something like this:

<html>
<head>
  <meta charset="UTF-8">
  <script type="text/javascript" src="./web3.min.js"></script>
  <script type="text/javascript" src="./human_standard_token_abi.js"></script>
  <script>
    var web3 = new Web3(new Web3.providers.HttpProvider("https://mainnet.infura.io/<APIKEY>"));

    address = "0x0e2e75240c69495d2b9e768b548db381de2142b9" //From Etherscan
    contractAddress = "0xd26114cd6EE289AccF82350c8d8487fedB8A0C07" //OMG
    contractABI = human_standard_token_abi

    tokenContract = web3.eth.contract(contractABI).at(contractAddress)

    console.log(tokenContract.balanceOf(address).toNumber())
  </script>
</head>
<body>
  <p>Check the console (F12)</p>
</body>
</html>

And the output:

What if…

Let’s try a few fringe scenarios with this contract ABI. We will be testing this against OmiseGo.

  • What if we call a contract with a function we defined, but does not exist on the contract?

OMG does not have a version() function, but we define it by default in our human_standard_token_abi. So let’s call it:

console.log(tokenContract.version())

“Uncaught Error: new BigNumber() not a base 16 number”

  • What if we call a contract with a function that exists, but we did not define in the ABI?

OMG has the function mintingFinished() which is not part of the Human Standard Token ABI. If we call it, we get the following error:

“Uncaught TypeError: tokenContract.mintingFinished is not a function”

 

What I hope you learned:

  • The requirements to call an ERC-20 compliant token contract using Web3.js
  • What a Token Contract ABI is on the Ethereum Blockchain
  • Why the ERC-20 standard is pretty great for develeopers
  • How to easily add the human_standard_token_abi to your JavaScript web application

Keep an eye out for my next post, which will detail how to set up Web3.js for JavaScript promises, so we can execute multiple Web3 functions asynchronously. Then we will take both of these pieces, and make a simple web app that can fetch the ERC-20 token balance at an Ethereum address.

If you liked this post, and want to support me, feel free to send donations here:

0xD62835Fe2B40C8411A10E7980a290270e6A23cDA

Correcting the Ethereum and Web3.js “Hello World”

Just 2 days ago I blogged about a quick project which I considered a “Hello World” application for Ethereum and Web3.js. However, I quickly learned that even in my short 31 lines of code, I made numerous mistakes which do not follow the best practices for developing Web3.js applications.

The main part of the sample was the Web3.js stuff, which could be broken into two logical sections:

  1. Establishing a Web3 Provider
  2. Getting the ETH balance of an Ethereum Address

Both of these sections had mistakes in my original code, and this post will show you how to fix them! I will be updating the main blog post to include these changes as well, but I wanted to document the subtleties of the changes, and what I have learned since then. BTW, all of these mistakes could be avoided if you read the MetaMask developer documentation.

Ethereum Browser Environment Check

In my original sample, I simply depend on the Web3 HTTP Provider to access the Ethereum network. However, using MetaMask or the Mist Browser, users will already have direct access to the Ethereum network through those providers, and do not need to use the HTTP Provider. As said in the Web3 JavaScript app API Documentation:

…you need to create a web3 instance, setting a provider. To make sure you don’t overwrite the already set provider when in mist, check first if the web3 is available…

To fix this, we mostly follow the code sample provided by MetaMask:

window.addEventListener('load', function () {
    if (typeof web3 !== 'undefined') {
        console.log('Web3 Detected! ' + web3.currentProvider.constructor.name)
        window.web3 = new Web3(web3.currentProvider);
    } else {
        console.log('No Web3 Detected... using HTTP Provider')
        window.web3 = new Web3(new Web3.providers.HttpProvider("https://mainnet.infura.io/noapikey"));
    }
})

As they mention on the MetaMask developer documentation:

Note that the environmental web3 check is wrapped in a window.addEventListener('load', ...) handler. This approach avoids race conditions with web3 injection timing.

With our new code, as soon as the page loads, we detect if the browser being used already has a Web3 provider set up, and if it does we use it! Otherwise, we will use the HTTP Provider from Infura.io. For most users, I would assume they do not have MetaMask, and thus this change is not very important; but it is certainly best practice, and I am happy to oblige.

Chrome with MetaMask:

Firefox without Web3 Provider:

Asynchronous calls to the Ethereum network

If you have been following along word for word, you might have copied the changes mentioned above, loaded it in your MetaMask enabled browser (from your web server), and tried to get your ETH balance… Here is what you will see:

If we continue to read the MetaMask developer documentation, we would see the following:

The user does not have the full blockchain on their machine, so data lookups can be a little slow. For this reason, we are unable to support most synchronous methods.

This means we need to turn our call to get the ETH balance, which is currently a synchronous HTTP request, into an asynchronous request. We can do this by adding an error first callback as the last parameter of the function:

function getBalance() {
    var address, wei, balance
    address = document.getElementById("address").value
    try {
        web3.eth.getBalance(address, function (error, wei) {
            if (!error) {
                var balance = web3.fromWei(wei, 'ether');
                document.getElementById("output").innerHTML = balance + " ETH";
            }
        });
    } catch (err) {
        document.getElementById("output").innerHTML = err;
    }
}

If we try to run this code now with MetaMask as our provider, everything works again!

The first, but certainly not last mistake…

Phew! We fixed our Hello World application! Take a look at the overall changes on GitHub. I think this goes to show how difficult it can be to learn things on your own, and some of the best practices that can be overlooked so easily. I hope that I am able to go through these issues so that you don’t have to. If you find any other issues with this or future samples I create, please let me know!

Special shout out to Reddit user JonnyLatte for telling me the errors in my ways, and getting me to read more of the documentation around Web3!

As always, if you found this content helpful, feel free to show some appreciation at this address: 0xD62835Fe2B40C8411A10E7980a290270e6A23cDA

Ethereum and Web3.js “Hello World”: Get the ETH Balance of an Ethereum Address

Using just 41 lines of HTML + JS, we create a Web3.JS application which can get the ETH Balance of an Ethereum Address [Final Result] [GitHub]

 

For me, the hardest part of learning new technical skills is overcoming the hurdle of simply getting started. The Ethereum development space is booming, and the ability to make relatively simple web applications that interact with the Ethereum blockchain is at a premium. Today, development on World Wide Web requires you to compete with a huge number of fully developed, feature rich applications, where it is very unlikely that you are actually contributing value. However, the same is absolutely not true for Ethereum and blockchain as a whole. There are so many utilities and tools that can bring value to this ecosystem, all with relatively low feature requirements.

So let’s overcome the first barrier by building a “Hello World” application.

From my perspective, the perfect project for something like this would be a bare-bones single-page application which fetches the ETH balance of an Ethereum address. This is about as simple as it gets to allow a user to interact with the blockchain, and thanks to Web3.js, it is also really simple to implement!

Prerequisites

To gain access to the Ethereum network, you will need to gain access to a Web3 Provider. As I will talk about more below, this comes natively with certain Ethereum focused browsers, but for the average user you will need to provide them with their own gateway to the blockchain. Basically, you need someone to provide your app the data that is actually on the blockchain, and Web3.js has the ability to interact directly with an HTTP Provider to bring you this data with minimal effort.

I used Infura.io as my Ethereum provider (for no other reason than they showed up first when searching), and after spending less than a minute registering with them for free, I was given my unique address to their Main Ethereum Network where my API Key was appended at the end.

Save this URL, as you will be using it very shortly.

https://mainnet.infura.io/<APIKEY>

The only other thing you need to get started is your own copy of Web3.js which can be found on GitHub. Just download and unpack the ZIP file.

Create a new folder where you want your project to live, and create an index.html file. Then, from the Web3.js download, copy web3.min.js to that folder.

ethbalance/     (folder)
├── index.html
└── web3.min.js

Finally, make sure you initialize your index.html skeleton:

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
</head>

<body>
</body>
</html>

Let’s get started!

To make this as simple as possible, we are going to create a single HTML file which will contain all the code necessary to complete this project. It will be broken into 2 parts:

  1. HTML to render a bare-bones web page
  2. JavaScript to initialize a Web3 object, and interact with our HTTP Provider

HTML Body

Our HTML body needs a text field for the user to input an Ethereum address, a button to trigger the JavaScript, and an output area to display the result. I am going to assume that if you are reading this post, you have enough familiarity with HTML that I can just breeze over this, and give you the code:

<body>
    <h1>ETH Balance Fetcher</h1>
    <p>Enter your Ethereum Address:</p>
    <input type="text" size="50" id="address" />
    <button type="button" onClick="getBalance();">Get Balance</button>
    <br />
    <br />
    <div id="output"></div>
</body>

The only thing of note here is that when you click the button we created, it triggers a JavaScript function getBalance(), and that is what we are going to write next!

HTML Head: JavaScript and Web3

Now it is time to prepare the JavaScript required to make this all work. We are going to need to get the Ethereum address inputted by the user, initiate our connection to the Ethereum Provider, and then query the blockchain for the ETH balance at that address. Oh, of course we will also send back the result and update the HTML with the value. Here is our HTML head template:

<head>
    <!-- Check if Web3 already defined -->
    <!-- If not, connect to HTTP Provider -->
    <!-- getBalance() function -->
</head>

First we will load and set up our Web3 provider. If you are using an Ethereum compatible browser like Brave, Chrome + MetaMask, or Mist, you will already have your own Web3 provider established natively. You can access that connection with this:

<head>
    <meta charset="UTF-8">
    <script type="text/javascript" src="./web3.min.js"></script>
    <script type="text/javascript">
        window.addEventListener('load', function () {
            if (typeof web3 !== 'undefined') {
                console.log('Web3 Detected! ' + web3.currentProvider.constructor.name)
                window.web3 = new Web3(web3.currentProvider);
            }
        })

    <!-- If not, connect to HTTP Provider -->
    <!-- getBalance() function -->
    </script>
</head>

 

More likely, the user does not have one of these browsers, so we need to establish our own connection to the Ethereum network. We can do this with the URL that you saved earlier from Infura.io, and establishing an HTTP Provider:

<head>
    <meta charset="UTF-8">
    <script type="text/javascript" src="./web3.min.js"></script>
    <script type="text/javascript">
        window.addEventListener('load', function () {
            if (typeof web3 !== 'undefined') {
                console.log('Web3 Detected! ' + web3.currentProvider.constructor.name)
                window.web3 = new Web3(web3.currentProvider);
            } else {
                console.log('No Web3 Detected... using HTTP Provider')
                window.web3 = new Web3(new Web3.providers.HttpProvider("https://mainnet.infura.io/<APIKEY>"));
            }
        })

    <!-- getBalance() function -->
    </script>
</head>

At this point, we can do just about anything that Web3.js offers, but for our purposes we only need to query the blockchain for the address, and return the ETH balance. So let’s set up our getBalance() function:

<head>
    <meta charset="UTF-8">
    <script type="text/javascript" src="./web3.min.js"></script>
    <script type="text/javascript">
        window.addEventListener('load', function () {
            if (typeof web3 !== 'undefined') {
                console.log('Web3 Detected! ' + web3.currentProvider.constructor.name)
                window.web3 = new Web3(web3.currentProvider);
            } else {
                console.log('No Web3 Detected... using HTTP Provider')
                window.web3 = new Web3(new Web3.providers.HttpProvider("https://mainnet.infura.io/<APIKEY>"));
            }
        })
        function getBalance() {
            var address, wei, balance
            address = document.getElementById("address").value
            try {
                web3.eth.getBalance(address, function (error, wei) {
                    if (!error) {
                        var balance = web3.fromWei(wei, 'ether');
                        document.getElementById("output").innerHTML = balance + " ETH";
                    }
                });
            } catch (err) {
                document.getElementById("output").innerHTML = err;
            }
        }
    </script>
</head>

Walking through the function:

First we store the value of the text field from our HTML page into the address variable. Then, we will try to use the web3 object we intialized earlier to call the function web3.eth.getBalance() which accepts an Ethereum Address as an input. Note that we need to make this call asynchronously as the user does not have the full blockchain loaded on their machine, so some calls may run slow. Rather than lock the user’s interface, we let the the call happen in the background, and when it is complete, we trigger an update to the page. This is required to support MetaMask, but benefits all Web3 applications. If you want to learn more about how to make these requests asynchronous, take a look at the “Using callbacks” section in the Web3 documentation.

Once the asynchronous request is complete, we will get back a Wei balance as a result. But we want the Ether value, so we do one last step to convert the value: web3.fromWei(wei, 'ether'). If all of this is successful, we update the output div with our result, otherwise if it fails at any point we catch the error, and output that message instead.

Here is the final index.html file which you should be able to use as soon as you paste in your <APIKEY> from Infura.io. You can also download this project directly from my GitHub.

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <script type="text/javascript" src="./web3.min.js"></script>
    <script type="text/javascript">
        window.addEventListener('load', function () {
            if (typeof web3 !== 'undefined') {
                console.log('Web3 Detected! ' + web3.currentProvider.constructor.name)
                window.web3 = new Web3(web3.currentProvider);
            } else {
                console.log('No Web3 Detected... using HTTP Provider')
                window.web3 = new Web3(new Web3.providers.HttpProvider("https://mainnet.infura.io/<APIKEY>"));
            }
        })
        function getBalance() {
            var address, wei, balance
            address = document.getElementById("address").value
            try {
                web3.eth.getBalance(address, function (error, wei) {
                    if (!error) {
                        var balance = web3.fromWei(wei, 'ether');
                        document.getElementById("output").innerHTML = balance + " ETH";
                    }
                });
            } catch (err) {
                document.getElementById("output").innerHTML = err;
            }
        }
    </script>
</head>
<body>
    <h1>ETH Balance Fetcher</h1>
    <p>Enter your Ethereum Address:</p>
    <input type="text" size="50" id="address" />
    <button type="button" onClick="getBalance();">Get Balance</button>
    <br />
    <br />
    <div id="output"></div>
</body>
</html>

You can try this out right now using the version I hosted here: http://shawntabrizi.com/ethbalance/

One thing you should note is that this is a client-side JavaScript application. There are a million reasons why making client-side apps is so much easier for development, but one big downside is that your API key will be exposed to anyone who simply inspects the HTML. Please do not abuse my API key, and please do not ship a production application using a method like this unless you are prepared to get your API key terminated.

Note that I have made updates to this blog post, correcting some issues regarding best practices with Web3. You can learn more about the errors I made along the way here.

I hope this helps you get started with developing for Ethereum using Web3.js! If you liked this content, and want to support me, feel free to send donations to this address: 0xD62835Fe2B40C8411A10E7980a290270e6A23cDA

Getting started with Ethereum Wallet

Well, Cryptocurrencies have consumed my attention for the past 5 months. I first learned about Etherum back in May when I was attending a Bachelor party. Since then, I have been a relatively hands off investor, but it is time to actually start contributing and learning about the development platform.

To start, we need to install the Etherum Wallet, which also downloads the entire Etherum chain, which as of this post is over 4.3 million blocks long. The unfortunate part of this process is that it takes a lot of disk space, and if you are like me, you have a SSD for your OS and installed programs, and an HDD for storage. 4.3 million blocks is supposed to be around 70+ GB of files, which I certainly do not want on my SSD.

Etherum Wallet does not currently let you choose where to store these files, so by default they go to:

C:\Users\<username>\AppData\Roaming\Ethereum

So how can we move these files to our HDD?

Using Symbolic Links (Symlinks)!

Let’s do it!

I want to store all these files on my HDD at this folder path:

D:\Cryptocoins\Ethereum

Before I install Etherum Wallet, I should set up the symlink as so:

> Run CMD as an Administrator

mklink /j "C:\Users\<username>\AppData\Roaming\Ethereum" "D:\Cryptocoins\Ethereum"

If successful, you will get the following message:

Junction created for C:\Users\<username>\AppData\Roaming\Ethereum <<===>> D:\Cryptocoins\Ethereum

This will create a Directory Junction which basically makes the first folder path point to the second folder path. Then when I install and run Etherum Wallet, it will start to download all of the blocks into this alternate directory!

And in my HDD:

What if I migrate an existing installation?

Let’s say you already installed and downloaded the Ethereum Chain but you want to do this. Well that is also pretty easy. First copy all the contents from

C:\Users\<username>\AppData\Roaming\Ethereum

to your new location. It should be pretty large, so it may take a while. Be extra careful to make a backup of your keystore folder. Once the files have been copied over, you must delete the entire original Ethereum directory, including the Ethereum folder. Then you can run the same mklink in CMD as an Administrator.

mklink /j "C:\Users\<username>\AppData\Roaming\Ethereum" "D:\Cryptocoins\Ethereum"

If you forgot to delete the Ethereum folder, you will get this error:

Cannot create a file when that file already exists.

I hope this helps someone! It certainly helped me as I was getting started.

Adding AAD Service Principal to the Company Administrator Role using the AAD PowerShell Module

When creating a new Azure Active Directory application, developers may run into a a problem when calling the AAD Graph API where they lack the correct permissions to call the APIs they want when calling in the App Only Flow (Client Credentials Flow).

Is this message familiar to you?

"odata.error":{
    "code":"Authorization_RequestDenied",
    "message":{
        "lang":"en","value":"Insufficient privileges to complete the operation."
    }
}

The correct thing to do would be to try and investigate the permissions you have granted to your application, but there are some APIs which are not even supported through the permissions publicly exposed by the AAD Graph API. Maybe you just want to overcome this error for the time being and continue testing your end to end experience.

Using the AAD PowerShell Module, you can:

  • Give your application full access to the Graph API in the context of my tenant.

or

  • Grant your application permissions to my tenant which are not currently supported with the permissions exposed by the AAD Graph API.

“How?” you might ask. Well, you can elevate the level of access an Application has in your tenant by adding the service principal of that application to the Company Administrator Directory Role. This will give the Application the same level of permissions as the Company Administrator, who can do anything. You can follow these same instructions for any type of Directory Role depending on the level of access you want to give to this application.

Note that this will only affect the access your app has in your tenant.
Also you must already be a Company Administrator of the tenant to follow these instructions.

 

In order to make the change, you will need to install the Azure Active Directory PowerShell Module.

Once you have the module installed, authenticate to your tenant with your Administrator Account:

Connect-MSOLService

Then we need to get the Object ID of both the Service Principal we want to elevate, and the Company Administrator Role for your tenant.

Search for Service Principal by App ID GUID:

$sp = Get-MsolServicePrincipal -AppPrincipalId <App ID GUID>

Search for Directory Role by Name

$role = Get-MsolRole -RoleName "Company Administrator"

Now we can use the Add-MsolRoleMember command to add this role to the service principal.

Add-MsolRoleMember -RoleObjectId $role.ObjectId -RoleMemberType ServicePrincipal -RoleMemberObjectId $sp.ObjectId

To check everything is working, lets get back all the members of the Company Administrator role:

Get-MsolRoleMember -RoleObjectId $role.ObjectId

You should see your application in that list, where RoleMemberType is ServicePrincipal and DisplayName is the name of your application.

Now your application should be able to perform any Graph API calls that the Company Administrator could do, all without a user signed-in, using the Client Credential Flow.

Let me know if this helps!

Common Microsoft Resources in Azure Active Directory

I have seen a lot of StackOverflow posts trying to debug pretty basic errors when getting an access token to Microsoft Resources. Sometimes the issue is as simple as a typo in the “resource” value in the token request. When helping these users, I struggle to find public documentation which shows plainly the correct resource values for these different APIs!

That is going to change starting now. Here will be a list of the most popular Microsoft APIs exposed on Azure Active Directory, along with the basic information you may need to get an access token to those resources for PROD. (If you want the details for other Environments, let me know!)

Note: The Resource URI must match exactly what is written below, including any trailing ‘/’ or lack thereof.

 

Resource Name Resource URI Application ID
AAD Graph API https://graph.windows.net/ 00000002-0000-0000-c000-000000000000
Office 365 Exchange Online https://outlook-sdf.office.com/ 00000002-0000-0ff1-ce00-000000000000
Microsoft Graph https://graph.microsoft.com 00000003-0000-0000-c000-000000000000
Skype for Business Online https://api.skypeforbusiness.com/ 00000004-0000-0ff1-ce00-000000000000
Office 365 Yammer https://api.yammer.com/ 00000005-0000-0ff1-ce00-000000000000
OneNote https://onenote.com/ 2d4d3d8e-2be3-4bef-9f87-7875a61c29de
Windows Azure Service Management API https://management.core.windows.net/ 797f4846-ba00-4fd7-ba43-dac1f8f63013
Office 365 Management APIs https://manage.office.com c5393580-f805-4401-95e8-94b7a6ef2fc2
Microsoft Teams Services https://api.spaces.skype.com/ cc15fd57-2c6c-4117-a88c-83b1d56b4bbe
Azure Key Vault https://vault.azure.net cfa8b339-82a2-471a-a3c9-0fc0be7a4093

Who knows if this will actually end up helping anyone, but I hope it will!

Refresh Tokens for Azure AD V2 Applications in Flask

I have been working on a few projects recently that used Flask, a Python web framework, and Azure Active Directory to do things related to the Microsoft Graph. Using flask_oauthlib and the Azure AD V2 endpoint, it has been really easy to set up basic authentication for my web apps.

However, we quickly ran into basic authentication headaches like token expiry. It seems pretty obvious to the end user that as long as they haven’t logged out since the last time they visited the site, they should stay automatically logged in. However, if you set up a naive implementation of authentication, you will find that the access token you store for the user is only valid for a limited time; by default just 1 hour.

So do we make our user sign in every hour?

Hell no. We need to use refresh tokens which can be exchanged for NEW access tokens, all without the user being asked to sign in again.

How do we get a refresh token?

In order to get a refresh token from the Azure AD V2 endpoint, you need to make sure your application requests a specific scope: offline_access. As stated here:

When a user approves the offline_access scope, your app can receive refresh tokens from the v2.0 token endpoint. Refresh tokens are long-lived. Your app can get new access tokens as older ones expire.

If your app does not request the offline_access scope, it won’t receive refresh tokens.

So how do we do it?

Unfortunately flask_oauthlib does not directly support refresh tokens, but it does support remote methods, so we should be able to simply make a POST request for a new access token! Here is the surprisingly simple code you need:

    data = {}
    data['grant_type'] = 'refresh_token'
    data['refresh_token'] = session['refresh_token']
    data['client_id'] = microsoft.consumer_key
    data['client_secret'] = microsoft.consumer_secret
    
    response = (microsoft.post(microsoft.access_token_url, data=data)).data

That’s it! What you are not seeing here is the original code used to get an access token, but I am just doing the normal flask_oauthlib stuff (which you can find in this sample), and storing the results of the token response in the user’s session (Access Token, Expires In, and Refresh Token).

Now, in order to invoke this code at the right time, I need to create a view decorator, and Flask has a sample for almost exactly what we want to do here: a login required decorator.

Basically, we add this decorator to any view where we expect the user to be signed in. If the user has no token, we will redirect them to the login page. If they have an expired token and a refresh token, we will use the refresh token to get a new access token. Otherwise, if the token is present and valid, we simply let the view load.

Check it out in full action here:

def login_required(f):
    @wraps(f)
    def wrap(*args, **kwargs):
        if 'microsoft_token' in session:
            if session['expires_at'] > datetime.now():
                return f(*args, **kwargs)
            elif 'refresh_token' in session:
                # Try to get a Refresh Token
                data = {}
                data['grant_type'] = 'refresh_token'
                data['refresh_token'] = session['refresh_token']
                data['client_id'] = microsoft.consumer_key
                data['client_secret'] = microsoft.consumer_secret

                response = (microsoft.post(microsoft.access_token_url, data=data)).data
                
                if response is None:
                    session.clear()
                    print("Access Denied: Reason=%s\nError=%s" % (response.get('error'),request.get('error_description')))
                    return redirect(url_for('index'))
                else:
                    session['microsoft_token'] = (response['access_token'], '')
                    session['expires_at'] = datetime.now() + timedelta(seconds=int(response['expires_in']))
                    session['refresh_token'] = response['refresh_token']
                    return f(*args, **kwargs)
        else:
            return redirect(url_for('login'))
    return wrap

So now when we want to make a page require login we simply set up our view function like so:

@app.route('/home')
@login_required
def home():
    #view code goes here

Adding this kind of code to your Azure AD Flask application can really be the difference between a good and bad user experience. Let me know if this ends up helping you out!

Revoking Consent for Azure Active Directory Applications

Today I was presenting one of my hackathon projects which I worked on this year to the Identity team at Microsoft. In order for my project to work, I needed to get consent to read the mail of the signed-in user. Depending on who you talk to, a permission like this could be easy as pie to consent to or something that they would never accept. Some people fall in the middle where they are happy to consent as long as they can choose to revoke that consent after they are done playing with the app.

That is why I am writing this. How easy it is to forget that it is NOT very obvious what you need to do to revoke consent for an Azure Active Directory Application. Even people on the Identity team don’t always know! So let’s talk about how you can do it 🙂

Using the My Apps Portal for Individual User Consent

You can revoke individual user consent through the My Apps Portal. Here you should see a view of all applications that you or even your administrator (on your behalf) has consented to:

With applications your admin has consented to, all you can do is open the app, however for apps where you individually consented as a user, you can click “Remove” which will revoke consent for the application.

Using the Azure Portal to Remove Tenant Wide Consent

If you are a tenant administrator, and you want to revoke consent for an application across your entire tenant, you can go to the Azure Portal.  Whether it be for a bunch of users who individually consented or for an admin who consented on behalf of all the users, by simply deleting the application’s service principal, you will remove all OAuth 2 Permission Grants (the object used to store consent) linked to that application. Think about removing the service principal like uninstalling the application from your tenant.

You could delete the service principal a bunch of different ways like through Azure Active Directory PowerShell or through the Microsoft Graph API, but the easiest way for the average administrator is right through the Azure Portal.

Navigate to the Enterprise Applications blade in the Azure portal:

Then click “All Applications” and search for the application you want to revoke consent for:

When you click the application, you will be brought to an “Overview” section, where a tempting button called “Delete” will be at the top. Before you click this button,  you might want to take a peak at the “Permissions” section to see the types of consent that was granted to this application:

Once you feel confident that you want to delete this application, go back to “Overview” and click “Delete”!

Viola! The app and all consent associated with that app is now gone.

Scraping LinkedIn Topics and Skills Data

What are the most popular skills among LinkedIn users?
What are the most popular skills among Microsoft employees?
Other top tech companies? (Google, Amazon, Facebook, etc…)
What are the most interconnected skills?

These are questions that LinkedIn does not provide a direct answer to. However, through their “Topics Directory“, we should be able to come to these conclusions ourselves!

The Topics Directory seems to be an index over all the different skills that people have put on their profile, alphabetically ordered by skill name. Some pages, like Azure, have very specific metadata about the skill, while others like Azure Active Directory, show up in the directory, but do not have this additional metadata.

If we look at the additional metadata, you can see that it calls out a number of very interesting data points. It tells you:

  1. How many people have this skill
  2. The top 10 companies that have employees who register this skill
  3. (My guess) The top skills that people have who also have this skill
  4. (My guess) The top related skills

Now clearly, there is some poor Web Design, in that there are two different sections, both with the same title “Top Skills”, but contain different data. We will have to do our own interpretation of what this data exactly means, but nonetheless, the data is all useful.

So how do we start scouring this data to answer the questions I proposed at the start of this post? Well, by scraping it of course, and storing it into our own database. Now, this is not an original idea, but certainly I have not seen anyone collect the level of data which I am interested in. I want to have a copy of all the data points above for each topic, all in a single list!

So let’s do it!

Of course we will be using Python + Beautiful Soup + Requests. You can find the latest version of my LinkedIn scraper on my GitHub. Here, I will only be looking at the main function, which describes the logic of my code, not the specific functions which actually does the scraping. You can find that on my GitHub.

from bs4 import BeautifulSoup
import requests
import string
import re
import json

# ...
# sub-functions removed, check GitHub for full source
# ...

def main():
    letters = list(string.ascii_lowercase)
    letters.append('more')
    base_url = "https://www.linkedin.com/directory/topics-"
    for letter in letters:
        letter_url = base_url + letter + "/"
        content = get_content(letter_url)
        for con in content:
            if letter == 'y' or letter == 'z':
                sub_content = content
            else:
                letter_page_url = con.find("a")
                print(letter_page_url)
                if letter_page_url.has_attr('href'):
                    sub_content = get_content(letter_page_url['href'])
                else:
                    sub_content = None
            for sub_con in sub_content:
                topic_url = sub_con.find("a")
                topic = scrape_data(topic_url)
                create_json(topic)
            if letter == 'y' or letter == 'z':
                break

To scrape this site, we are basically figuring out the pattern which generates these pages. LinkedIn organizes these topics first by letter, https://www.linkedin.com/directory/topics-{letter}/. Then on each “letter page”, they group the topics by alphabetical order, in groups, https://www.linkedin.com/directory/topics-{letter}-{number}/. Finally, if you navigate to the specific topic, you will get the final page with data, https://www.linkedin.com/topic/{topic}.

There are a few exceptions to this pattern, which added complexity to the scraper. Basically the letters Y and Z do not have enough topics to be able to put them in groups, which means instead of navigating 3 pages deep to get the data, we need to navigate only 2 pages deep. You can see I handle this situation in my scraper. Other than that, once I get the data off the page, I put it into a JSON file for later usage!

One thing to note, but that I will not go into detail here about is that LinkedIn actually blocks scrapers in general, by creating a 999 response when you try to get data using a bot. If you want to run this script, you will have to overcome this. If you look online, people mention that you might need to update the user-agent passed in the headers of the web requests, but this did not work for me. I might go into detail about this during another post.

Results

So, let’s look at some of the data. I can import the JSON as an array of dictionaries in Python, and then try and write some queries to get data from it. I am not claiming to write the best or most efficient queries, but hopefully they will get the correct data.

Loading the data:

with open(r'C:\Users\shawn\Documents\GitHubVisualStudio\LinkedIn-Topic-Skill-Analysis\results\linkedin_topics_7-23-17.json') as data_file:
    data = json.load(data_file)

How many topics are there total?

len(data)
33188

What are the most popular overall topics/skills?

ordered_by_count = sorted(data, key=lambda k: k['count'] if isinstance(k['count'],int) else 0, reverse=True)
for skill in ordered_by_count[:20]:
    print(skill['name'])
Management - 69725749
Microsoft - 55910552
Office - 46632581
Microsoft Office - 45351678
Planning - 34397412
Microsoft Excel - 32966966
Leadership - 31017503
Customer Service - 30810924
Leadership Management - 25854094
Word - 25793371
Project - 25766790
Project+ - 25766790
Microsoft Word - 25567902
Business - 25374740
Customer Management - 24946045
Management Development - 24207445
Development Management - 24207409
Project Management - 23922491
Marketing - 23047665
Customer Service Management - 22856920

What are the top <Company> Skills?

company = 'Microsoft'
company_skills = []
for skill in ordered_by_count:
    if skill['companies'] is not None:
        if company in skill['companies']:
            company_skills.append(skill)

order_by_company = sorted(company_skills, key=lambda k: k['companies'][company], reverse=True)
for skill in order_by_company[:20]:
     print(skill['name'], "-", skill['companies'][company])

Microsoft

Cloud - 74817
Cloud Computing - 74817
Cloud-Computing - 74817
Cloud Services - 74817
Management - 73123
Management Skills - 73123
Multi-Unit Management - 73123
Enterprise - 54516
Enterprise Software - 54516
Software Development - 53201
Project Management - 52083
Project Management Skills - 52083
PMP - 52083
PMI - 52083
Strategy - 43983
SaaS - 41450
Software as a Service - 41450
Program Management - 40749
Business Intelligence - 39291
C# - 39158

Google

Java - 23225
Strategy - 22235
Marketing - 21672
Data-driven Marketing - 21672
Python - 20788
Software Development - 20406
C++ - 20199
Social Media - 20082
Social Networks - 20082
Digital Marketing - 19942
Online Advertising - 19922
Marketing Strategy - 16882
Linux - 16272
JavaScript - 14567
JavaScript Frameworks - 14567
C - 14460
C Programming - 14460
Online Marketing - 13925
Online-Marketing - 13925
Social Media Marketing - 12931

Amazon

Leadership - 44329
Leadership Skills - 44329
Microsoft Office - 42713
Office for Mac - 42713
Customer Service - 36176
Microsoft Excel - 33403
Java - 25609
Word - 23314
Microsoft Word - 23314
PowerPoint - 22318
Microsoft PowerPoint - 22318
Social Media - 22110
Social Networks - 22110
C++ - 19619
Training - 19250
Marketing - 18826
Data-driven Marketing - 18826
Software Development - 18521
Public Speaking - 17366
C - 16813

Facebook

Digital Marketing - 4973
Online Advertising - 4334
Digital Strategy - 3399
Online Marketing - 3012
Online-Marketing - 3012
Facebook - 2883
Algorithms - 2881
Mobile Marketing - 2163
Machine Learning - 2103
Distributed Systems - 2033
User Experience - 1971
UX - 1971
Web Analytics - 1682
SEM - 1626
Computer Science - 1440
Google Analytics - 1261
Adwords - 1093
Google AdWords - 1093
Scalability - 1057
Mobile Advertising - 919

What are the top interconnected skills?

skill_count = {}
for topic in data:
    if topic['skills'] is not None:
        for top_skill in topic['skills']:
            if top_skill not in skill_count:
                skill_count[top_skill] = 1
            else:
                skill_count[top_skill] += 1
    if topic['topSkills'] is not None:
        for top_skill in topic['topSkills']:
            if top_skill not in skill_count:
                skill_count[top_skill] = 1
            else:
                skill_count[top_skill] += 1

for skill in sorted(skill_count, key=skill_count.get, reverse = True)[:20]:
    print(skill, "-", skill_count[skill])
Microsoft Office - 11081
Management - 8845
Customer Service - 7010
Project Management - 6902
Microsoft Excel - 4884
Leadership - 4682
Social Media - 3883
Research - 3798
Public Speaking - 3243
Marketing - 2644
Microsoft Word - 2426
Sales - 2335
SQL - 2322
Engineering - 2300
Business Development - 2071
Strategic Planning - 1879
Java - 1792
Adobe Photoshop - 1555
JavaScript - 1488
Microsoft PowerPoint - 1483

 

There is so much more we can do with this data, and I do have plans! I just can’t talk about them here. In a related note, I am super excited for the Microsoft Hackathon happening this next week. I will be using these tools, and hopefully more to accomplish an awesome project. Maybe more to share here in the future!

Clients and Tokens and Claims! Oh My!

Let me just jump to the point with this post: Client applications should not depend on claims in access tokens to gather data about the signed-in user or anything about the authenticated session.

Time and time again, I have seen client applications complain to me that certain claims, like group membership claims, are not appearing in the access token they receive, and they ask me how to enable this. They incorrectly assume that if they go into their application manifest, and change the “groupMembershipClaims” settings, that they will be able to start getting claims, but everyone eventually finds out… it doesn’t work!

Let’s take a look at source material; from the OAuth 2 specification:

An access token is a string representing an authorization issued to the client. The string is usually opaque to the client.

Unfortunately, the OAuth 2 specification is intentionally broad, but in summary, the ‘access token’ that is given to a client should only really be explored by the specified audience of the token. Some implementations of OAuth 2 do not even pass a JWT token to the client. Instead they pass a unique string, and then the resource exchanges that string for the actual token using a signed request. Alternatively, other implementations pass an encrypted JWT token rather than just a signed token. This means that the resource application uploads a token signing key which the authorization server uses to encrypt the full token. That means that the only person who can look at the claims in the token is the resource who also has the private key for decryption.

The implementation of OAuth 2 that I am most familiar with, Azure Active Directory,  issues a signed token, which means that its content is completely visible to the client. In the future, Azure AD may add support for encrypted tokens, which means that clients are going to have to start following the correct practices.

Need to know about the user signed into your web application?

>> Get an ID token! These are meant for client consumption!

Need to know which groups a user is a member of?

>> Get an access token to the AAD or Microsoft Graph API and query the API!

Now lets go back to the original problem. If groupMembershipClaims are not meant for clients to get the claims in the access token, what are they used for? You might have figured out by now, but they are for resource applications to get the claims in the access token!

Lets show an example. To set up, I have registered two Azure AD Web Apps/APIs called Web API 1 and Web API 2. Both of these applications are identical, except Web API 1 has the setting “groupMembershipClaims”: “All”, and the other is set to null, which is default. I have to set up a fake App ID URI for both apps, and I have to make sure that each application has the other set as a “required permission”.

I will be using my PowerShell Scripts to quickly get two access tokens. One where the client is Web API 1 and the resource is Web API 2, and vice versa.

Let’s look at the results, using my JWT Decoder to look at the payload:

Payload 1:  Client = Web API 1, Resource = Web API 2

{
    "aud": "https://shawntest.onmicrosoft.com/WebApi2",
    "iss": "https://sts.windows.net/4a4d599f-e69d-4cd8-a9e1-9882ea340fb5/",
    "iat": 1500243353,
    "nbf": 1500243353,
    "exp": 1500247253,
    "acr": "1",
    "aio": "ATQAy/.../oU",
    "amr": [ "rsa", "mfa" ],
    "appid": "eb7b6208-538c-487b-b5b5-137ac6ab6646",
    "appidacr": "1",
    "email": "shtabriz@microsoft.com",
    "family_name": "Tabrizi",
    "given_name": "Shawn",
    "idp": "https://sts.windows.net/72f988bf-86f1-41af-91ab-2d7cd011db47/",
    "in_corp": "true",
    "ipaddr": "XX.XXX.XXX.XXX",
    "name": "Shawn Tabrizi",
    "oid": "41bdce9b-3940-40a9-b2f2-03a003ad599c",
    "platf": "3",
    "scp": "user_impersonation",
    "sub": "hfS9IZ_..._JW8c5Gg",
    "tid": "4a4d599f-e69d-4cd8-a9e1-9882ea340fb5",
    "unique_name": "shtabriz@microsoft.com",
    "ver": "1.0"
}

Payload 2: Client = Web API 2, Resource = Web API 1

{
    "aud": "https://shawntest.onmicrosoft.com/WebApi1",
    "iss": "https://sts.windows.net/4a4d599f-e69d-4cd8-a9e1-9882ea340fb5/",
    "iat": 1500243330,
    "nbf": 1500243330,
    "exp": 1500247230,
    "acr": "1",
    "aio": "ATQAy/...BLDunA",
    "amr": [ "rsa", "mfa" ],
    "appid": "554e427d-36c3-4a77-89a5-a082ee333e12",
    "appidacr": "1",
    "email": "shtabriz@microsoft.com",
    "family_name": "Tabrizi",
    "given_name": "Shawn",
    "groups": [ "0f4374e6-8131-413e-b32b-f98bfdb371ed" ],
    "idp": "https://sts.windows.net/72f988bf-86f1-41af-91ab-2d7cd011db47/",
    "in_corp": "true",
    "ipaddr": "XX.XXX.XXX.XXX",
    "name": "Shawn Tabrizi",
    "oid": "41bdce9b-3940-40a9-b2f2-03a003ad599c",
    "platf": "3",
    "scp": "user_impersonation",
    "sub": "xy..._zGJEZnIB4",
    "tid": "4a4d599f-e69d-4cd8-a9e1-9882ea340fb5",
    "unique_name": "shtabriz@microsoft.com",
    "ver": "1.0",
    "wids": [ "62e90394-69f5-4237-9190-012177145e10" ]
}
Note that we only get the group membership claims when the resource application has this setting, not the client application. The client application has no power to control the types of claims in the token, because ultimately the token is not for them!
If you are building a client application using Azure Active Directory, please do not use the access token to try and get information about the authenticated session. The correct practice here is to request separately an ID Token , or to call the AAD / Microsoft Graph API to get the information you need. I hope you learned exactly how to use the “groupMembershipClaims” property and I hope this helps you build better apps in the future!