How to generate Terabytes of IOT data with Azure Data Lake Analytics

Hi everyone, during one of my projects I’ve been asked the following question:

I’m actually storing my IOT sensor’s data in Azure Data Lake for analysis and feature engineering , but currently I still have very few devices, so not a big amount of data and I’m not able to understand how much fast will be my queries and my transformations when with much more devices and months/years of sensor data my data lake will reach do over several terabytes.

Well in that case let’s generate quickly those terabytes of data using U-SQL capabilities!

Let’s assume that our data resembles the following:

deviceId, timestamp, sensorValue, …….

so we have for each IOT device a unique identifier called deviceId and let’s assume is a composition of numbers and letters, we have a timestamp indicating the time at millisecond precision, where the IOT event was generated and finally we have the values of the sensors in that moment (temperature, speed, etc..).

The idea is the following give a real deviceId, generate N “synthetic deviceIds” that have all the same data of the original device . So if we have , for example , 5 real deviceId each with 100.000.000 records (500.000.000 records in total), if we generate 1000 synthetic deviceIds for each real deviceId  we will have 1000x5x100.000.000 additional records so 500.000.000.000 records.

But we can expand the amount of synthetic data even more playing with time, for example, if our real data has events only for  2017, we can duplicate the entire dataset for all the years starting from 2006 to 2016 and have records.

Here some sample C# code that generates the synthetic deviceIds:

note the GetArraysOfSyntheticDevices function that will be executed into the U-SQL script.

Before using it we have to register the assembly into our DataLake account and database (in my case the master one):

DROP ASSEMBLY master.[Microsoft.DataGenUtils];
CREATE ASSEMBLY master.[Microsoft.DataGenUtils] FROM @”location of dll”;

Now we can read the original IOT data and create the additional data:

REFERENCE ASSEMBLY master.[Microsoft.DataGenUtils];

@t0 =

deviceid string,
timeofevent DateTime,
sensorvalue float
FROM “2017/IOTRealData.csv”
USING Extractors.Csv();

//Let’s have the distinct list of all the real DeviceIds
deviceid AS deviceid
FROM @t0;

//Let’s calculate for each deviceId an array of 1000 synthetic devices

@t2 =
SELECT deviceid,
Microsoft.DataGenUtils.SyntheticData.GetArrayOfSynteticDevices(deviceid, 1000) AS SyntheticDevices
FROM @t1;

//Let’s assign to each array of synthetic devices the same data of the corresponding real device

@t3 = SELECT a.SyntheticDevices,
FROM @t0 AS de INNER JOIN @t2 AS a ON de.deviceid== a.deviceid;

//Let’s use the explode function to expand the array to records

@t1Exploded =
emp AS deviceid,
FROM @t3 AS de
EXPLODE(de.SyntheticDevices) AS dp(emp);

//Now we can write the expanded data

OUTPUT @t1Exploded
TO “SyntethicData/2017/expanded_{*}.csv”
USING Outputters.Csv();

Once you have the expanded data for the entire 2017 you can just use c# DateTime functions that add Years, Months or days to a specific date, applied that to timeofevent column and write the new data in a new folder (for example SyntethicData/2016, SyntethicData/2015 etc…).



UniFi – Install a UniFi Cloud Controller on Azure

Hi everyone, this time I want to share one of my weekend projects inspired by this article of Troy Hunt.  As Troy I also experienced the pain of a single router/gateway/access point device and I decided to switch to UniFi devices .

On the net you can find dozens of tutorials on how assemble the various bits, here instead I want to explain how to set up the UniFi Controller software on Azure with very simple steps.

Step 1 : Go the following website and register . You will receive 200$ of Azure credits that can be used in the first month. Alternatevely you can register for visual studio essentials program and have 25$ each month for 1 year .

Step 2: Once you have your subscription up and running you can provision a Linux VM clicking on the big “+” button and searching for Linux Ubuntu Server:Step 3: After you selected the Virtual Machine just give to it a name, username, password , choose as VM disk type HDD, a resource group name that you like and a data center near to where you live.

Step 4: Set the size of the VM ( I used A2 standard but you can try also with A1 or even A0, also remember you can schedule the vm to start/stop only when you need and save a lot of money):

Step 5: In the additional settings leave everything to the default values and finally hit the purchase button!

Step 6: After few minutes or even less your VM is ready and you should be able to see a screen like this:

Write down the Public IP Address because you will need it shortly.

Step 7: Setup the open ports in order to have the Controller working correctly.

First go the the network interface of the vm

Select the first one (there should be only one):

Now select the network security group first clicking on link n.1 and then on the link n.2:

Finally here add the necessary inbound rules exactly as described here:

Step 8: Connect using Putty on Windows or you Mac OS standard shell to the mentioned IP address and install the Unifi controller software with those commands:

echo “deb unifi5 ubiquiti” | sudo tee -a /etc/apt/sources.list

sudo apt-key adv –keyserver –recv 06E85760C0A52C50

sudo apt-get update

sudo apt-get install unifi

Step 9: Connect to the controller web interface located here https://IP_Address:8443/ and complete the UniFi wizard:

Finally you may now proceed to adopt your UniFi devices using Layer 3 Adoption !

Digital Marketing in practice (final episode)

In our last episode of the series we have spoken of the holy grail of Digital Marketing landscape and how this is deeply connected to the identity of our customers. So let’s try to recap for a moment what we need :

  1. all our customer data (web logs, app logs, purchases, calls, surveys,etc…) marked with same identity Id in order to proper assign every event to the same customer and we need this data to be collected in real/near real time.
  2. to define what are our targets (sales, customer satisfaction, market penetration,etc..) and define a strategy to reach those goals.
  3. To define the strategy we use the data collected at point 1 to identify the patterns that are leading to : high revenues customers, abandoned carts, poor reviews, good surveys,etc….
  4. Once our overall strategy (sales, discounts, promos, coupons, social,etc.. ) is defined we need to put this strategy in practice defining our customers journeys, for example look at this or this , so literally we have to define on each touch point (where), what and when some “actions” will happen, who will be the target of those actions and what are the subsequent “sub actions” or steps that automatically have to happen at every step of the journey.
  5. To produce on all the touch points the respective UI associated to the actions.
  6. To go back to Point 1, evaluate the new data and check if the strategy is working and if necessary take the corrective actions.

Now in an hypothetical “perfect world” we should be finished,  but reality is much more complicated than that 🙂 .

rality check ahead sign

In fact , while we can define journeys and customer segments, profiles and target audiences , we need some “binding” to happen between our defined journeys and the real touch points.

An example? Let’s assume we define a coupon/loyalty initiative, this only means a quite large list of system configurations and actions :

  1. Define the new coupon initiative in the loyalty system
  2. Define the budget allocated for those coupons and the limits associated
  3. Integrate the new coupons with the e-commerce in order to have them to be applied and accepted at order time
  4. Integrate the journey builder actions into the e-commerce in order to have the e-commerce UI display the promotion new look & feel
  5. Integrate into e-commerce UI engine journey builder sub-steps if any
  6. Tag properly all the consumer journey steps in order to collect back the data points of the journey
  7. Etc..

Now repeat this for the marketing campaign system that handles email, sms and notifications, repeat this for the call center,etc….


As you can imagine we need a single unified collection of products (identity,e-commerce, apps, crm, marketing email/sms, etc…) all connected by the same vendor and the “unified data collector system” to be also the customer journey builder , in fact we can reasonably understand if our strategies are effective only if we can observe on the very same tool if the journeys we designed are working or not (what if we define a journey of 12 steps and almost nobody goes after step 3 ? ).

I guess that if you look now on preferred search engine and do so basic research you will find at least 20+ vendors that are saying that they have this kind of combined solution in place.

In reality , even if we assume that all the 20+ vendors have all fantastic and well connected platforms, all the enterprises have already a gargantuan amount of systems already in place and you cannot “turn off” everything and start from the scratch.

At the same time even if you start from zero, often the cost and the lock in risk associated with ALL IN ONE solutions are so high that you can anyway end up going to think about a Lego approach.


So what is our recommendation here ?


The right approach can be perhaps neither build or buy , I call it smart integration.

Smart integration means the following:

  1. Define your own MML : marketing markup language
  2. From the central data repository define the personalized MML journey for each customer
  3. Write this MML journey on the identity of each customer
  4. Build/Configure on all the touch points the functionality needed to read the MML journey (leverage first the customer device/browser to perform the integration) from the identity itself ,translate that in meaningful actions on that specific touch point (email template on marketing automation, call center Next Best Action on the CRM, etc…)
  5. Collect all the data and evaluate , correct and start again 🙂

An example of MML?

You can start simply with something like this:


Now if you want to create a unified MML definition for the main strategies and touch points , I think it would be a fantastic idea and for sure a very successful start up!


Digital Marketing in practice (part 4)

We finished part 3 defining the need of a customer identity provider that can seamlessly be integrated in all our touch points , but this also means that we can personalize our front ends only when our customers logins with our identity provider.

Can we personalize the appearance and the offers of our front ends without requiring the user to login?

If we look back to part 2 during the RTB the combined systems essentially do this job , because they recognize the user with a cookie or by device Id and they trigger the “right” advertising for him according to his profile (the real time auction is triggered only on the brands/campaigns that have this user profile in their target audience) .

Can we use cookies/device ids to do the same?

In reality we can do even more than that, because of the following : while DMP can “see” the same customer as two or more different customers because he uses a laptop with different browsers and a smartphone (in reality DMP uses a very sophisticated algorithms to match cookies and devices ids with IP addresses and other variables to remove ,when possible, duplicates and build unified profiles…) since we have an identity provider in place , when the customer logins we can match the same identity Id with all the different device ids and cookies ids and target correctly the very same customer.

However again , we don’t want to build completely from the scratch such a complicated system, and there are options available on the market .

We can ,for example, leverage Google Analytics (GA) Id coupled with our Identity Id as GA Crm Id (some examples are listed here) and use this combo to provide personalization even when customers are not logged (ga Id–> Identity Id–> customer identified).

Of course we need to store this relationship somewhere and the nosql structure of the identity provider can be a nice place (but it will not be so simple as you can imagine 😉 ).

Another way to do it is to have the personalization/promotion engine directly integrated with a DMP and use the DMP segments to define the personalization on the front-end, but this, if we plan to leverage correctly our identity provider, it is not a good idea.

If instead you plan to have a “no-registration/login” website, this technique can be really useful.

Now if we assign to each customer one or more “tags” where , for example we say if the customer is Gold, Bronze or Silver:

and we write those tags directly into the identity record of the customer, what will happen it is that any front-end that is able to read from the identity provider , can also do personalization and promotions looking at those tags, right?

And since the identity is the same across all front-ends, we can have always the right personalization for our customers right?

Well in theory yes , in practice we need something more to achieve this:

1) A personalization / loyalty / promotion engine on the front end that usually reads from front end db

2) A push down operation that copies the “tags” from the identity provider to the local front end

3) A magic system that writes to the identity provider the “right tags” for each customer and also coordinates what “gold/silver/bronze” mean for all the various actors :

  • what is the email template for a gold customer?
  • what is the discount for a bronze one in the e-commerce shop?
  • what silver means for the smartphone app?

If you pick the right identity provider and the right front ends the steps 1-2 should be only some minor configuration to do on the identity provider adapter for that front end.

Step 3 it is the holy grail of the overall landscape and we will look at it in the next part.