From Windows 10 to Mac OS Sierra without admin privileges

Hi everyone, lately thanks to my manager and my new employer I was able to switch from a Windows 10 laptop to a shiny mac book pro  and I want to share with you some tips and tricks that probably you will encounter if you will do the same. First let’s start with the basics: why I have chosen to switch? Well I always (since 2009) had only Apple devices at home and I always loved the consistency and the “stability” of the Apple devices, but I never had the opportunity to actually “work” with a Mac , so this is also a learning for me. If you actually never used a Mac the first obstacle will be shortcuts like CTRL+C and CTRL+V , the mouse clicks (actually the right click on the track pad), the scrolling with 2 fingers on the trackpad and now the shiny and mysterious touch bar. Passed this first shock, you will quickly get used to the magic search experience of spotlight, the backup for dummies of time machine and the well known experience of the App Store.

Now let’s focus on the work related stuff: you can finally have on a mac also office 2016 but it is miles and miles away from the functionalities and easy of use of Office 2016 on windows, not super evident differences but if you use office professionally you will quickly find the missing pieces.

Solution ? Go the App Store, purchase Parallels Lite and enjoy Linux and Windows Virtual Machines. You will have VMs without being admin because Parallels Lite uses the native hypervisor available on Mac since Yosemite.

Thanks to this I was able to have back also several “life saving” applications that I use daily like PowerBi Desktop, SQL Server Management Studio and Visual Studio 2017. To be honest they have their versions in the mac world but the functionalities that are missing in those versions are too numerous to live only with that.

So I ended up having a windows 10 VM full of software, so why don’t use directly windows? Well , with the windows VM i can exactly use windows for the apps that are running great on that platform and if the system starts to be unstable I can still normally work on my mac without losing my work while windows does his own “things” 🙂 .

When needed I leverage an ubuntu VM with docker  and vs code with the same segregation of duties principle (main OS fast and stable, guest OS with rich and dedicated software).

Now I work several times in this way : sql server hosted on linux, I do import/export of external data easily with Sql server management studio from windows and I run pyspark notebooks on docker accessing the same data and finally I do visualizations with power bi desktop on windows.

In case, like me, you have strict policies around admin accounts , I want to share with you this: do you remember the concept of portable apps in windows? Well on the mac you can do the same with some (not all) the applications that are outside the App Store (you can install almost all the apps in the App Store without admin privileges).

The technique to have an application on mac “portable” is simply the double extraction of the pkg files and Payload files to one folder that you can access (like your desktop), you can check the details here and here and basically run those applications from the locations that you like.

The exceptions will be :

  1. Applications not signed by a recognized and well know developer or software house
  2. Applications that on start up will ask you to install additional services
  3. Applications that before being launched require the registration of specific libraries/frameworks

There are cases (like azure machine learning workbech ) where the installer it’s actually writing everything in you user account folders but the last step will be the copy of the UI app to the Applications folder and this will fail if you are not admin. The solution is to look a bit inside the installer folders and find inside the json files the location of the downloaded packages . Once you find the URL of the missing one (use the installer error message to help you to find the package he was not able to copy) , download it locally and execute the app from any location, it should work without problems.

 

Annunci

Jazoon 2017 AI meet Developers Conference Review

Hi I had the opportunity to participate to this conference in Zurich on the 27 October 2017 and attend to the following sessions:

  • Build Your Intelligent Enterprise with SAP Machine Learning
  • Applied AI: Real-World Use Cases for Microsoft’s Azure Cognitive Services
  • Run Deep Learning models in the browser with JavaScript and ConvNetJS
  • Using messaging and AI to build novel user interfaces for work
  • JVM based DeepLearning on IoT data with Apache Spark
  • Apache Spark for Machine Learning on Large Data Sets
  • Anatomy of an open source voice assistant
  • Building products with TensorFlow

Most of the sessions have been recorded and they are available here:

https://www.youtube.com/channel/UC9kq7rpecrCX7S_ptuA20OA

The first session has been a more a sales/pre-recorded demos presentation of SAP capabilities in terms of AI mainly in their cloud:

1

But with some interesting ideas like the Brand Impact Video analyzer that computes how much airtime is filled by specific brands inside a video:

2

And another good use case representation is the defective product automatic recognition using image similarity distance API:

3

The second session has been around the new AI capabilities offered by Microsoft and divided into two parts:

Capabilities for data scientists that want to build their python models

  • Azure Machine Learning Workbench that is an electron based desktop app that mainly accelerates the data preparation tasks using “a learn by example” engine that creates on the fly data preparation code.

4

  • Azure Notebooks a free but limited Cloud Based Jupyter Notebook environment to share and re-use models/notebooks

5

  • Azure Data Science Virtual Machine a pre-built VM with all the most common DS packages (TensorFlow, Caffe, R, Python, etc..)

6

Capabilities (i.e. Face/Age/Sentiment/OCR/Hand written detection) for developers that want to consume Microsoft pre-trained models calling directly Microsoft Cognitive API

7

8

The third session has been more an “educational presentation” around deep learning, and how at high level a deep learning system work, however we have seen in this talk some interesting topics:

  • The existence of several pre-trained models that can be used as is especially for featurization purposes and/or for transfer learning

9

  • How to visualize neural networks with web sites like http://playground.tensorflow.org
  • A significant amount of demos that can show case DNN applications that can run directly in the browser

The fourth session has been one also an interesting session, because the speaker clearly explained the current possibilities and limits of the current application development landscape and in particular of the enterprise bots.

10

Key take away: Bots are far from being smart and people don’t want to type text.

Suggested approach bots are new apps that are reaching their “customers” in the channels that they already use (slack for example) and those new apps using the context and channel functionalities have to extend and at the same time simplify the IT landscape.

11

Example: bot in a slack channel that notifies manager of an approval request and the manager can approve/deny directly in slack without leaving the app.

The fourth and the fifth talk have been rather technical/educational on specific frameworks (IBM System ML for Spark) and on models portability (PMML) with some good points around hyper parameter tuning using a spark cluster in iterative mode and DNN auto encoders.

12

13

The sixth talk has been about the open source voice assistant MyCroft and the related open source device schemas.

The session has been principally made on live demos showcasing several open source libraries that can be used to create a device with Alexa like capabilities:

  • Pocketsphinx for speechrecognition
  • Padatious for NLP intent detection
  • Mimic for text to speech
  • Adapt Intent parser

14

The last session was on tensor flow but also in general experiences around AI coming from Google, like how ML is used today:

15

And how Machine Learning is fundamental today with quotes like this:

  • Remember in 2010, when the hype was mobile-first? Hype was right. Machine Learning is similarly hyped now. Don’t get left behind
  • You must consider the user journey, the entire system. If users touch multiple components to solve a problem, transition must be seamless

Other pieces of advice where around talent research and maintain/grow/spread ML inside your organization :

How to hire ML experts:

  1. don’t ask a Quant to figure out your business model
  2. design autonomy
  3. $$$ for compute & data acquisition
  4. Never done!

How to Grow ML practice:

  1. Find ML Ninja (SWE + PM)
  2. Do Project incubation
  3. Do ML office hours / consulting

How to spread the knowledge:

  1. Build ML guidelines
  2. Perform internal training
  3. Do open sourcing

And on ML algorithms project prioritization and execution:

  1. Pick algorithms based on the success metrics & data you can get
  2. Pick a simple one and invest 50% of time into building quality evaluation of the model
  3. Build an experiment framework for eval & release process
  4. Feedback loop

Overall the quality has been good even if I was really disappointed to discover in the morning that one the most interesting session (with the legendary George Hotz!) has been cancelled.

Helping Troy Hunt for fun and profit

Hi everyone,  I’m a huge fan of the security expert Troy Hunt and of his incredible “free!” service haveibeenpwned.com (if you don’t know it, please use it now! to test if your email accounts are compromised ! ).

Troy-Hunt-Profile-Photo

Now Troy has created a contest where you can actually win a shiny Lenovo laptop, if you create something “new” that can help people to be more aware of the security risks related to pwned accounts.

I decided to participate and my idea is the following, helping all the people that have gmail (and Hotmail/outlook/office 365 in alpha version!) accounts to verify if their friends, colleagues and family members have their email accounts compromised.

I uploaded the code and executables here and I strongly suggest you to read ENTIRELY the readme instructions to understand how the tool works, what are expected results and what you can do.

Regardless if I win the laptop or not, I already won because I was able, thanks to this tool, to alert my wife and some of my friends of the danger and to have the right “push” to convince them to setup two-factor authentication.

If you want to donate , for this effort please donate directly to Troy here, he deserves a good beer !

 

A massive adventure ends…

Hi everyone my journey in Microsoft ends and I want to use this opportunity to revisit many of the good moments I spent in this great company .

The first days I joined it was like entering in a new solar system where you have to learn all the names of the planets, moons, orbits at warp speed

but at the same time I had all the support of my manager , my team mates and my mentor !

Very quickly it was time to enter in action with my team and start working with clients, partners, developers, colleagues from different continents, teams and specializations.

We worked on labs and workshops helping customers and partners to discover and ramp up on Microsoft Advanced Analytics capabilities

and with customer engagements helping our clients to exploit all the possibilities that the Azure Cloud can provide to help them in reaching their goals

I certainly i cannot forget the incredible Microsoft Tech Ready event in Seattle:

and all the talented colleagues I had the opportunity to work with.

Finally I want to say a massive thank you to my v-team , I will miss you!!

As an adventure ends I am really excited to bring my passion and curiosity into a new one joining a customer .

I can’t wait to learn more, improve and be part of the massive digital transformation journey that is in front of us.

 

 

 

AI is progressing at incredible speed!

Several people tend to think that all the new AI technologies like Convolutional neural networks, Recurrent Neural Networks, Generative adversarial networks,etc.. are used mainly in tech giants like Google , Microsoft , etc.. , in reality many enterprises are already leveraging deep learning in production like Zalando, Instacart and many others . Well known deep learning frameworks like Keras, Tensorflow, CNTK, Caffe2, etc.. are now finally reaching a larger audience.

Big data engines like Spark are finally able to pilot also deep learning workloads and also the first steps to make large deep neural networks models fit inside small cpu/low memory, occasionally connected IOT devices are coming.

Finally new hardware has been built specifically for deep learning :

  1. https://www.microsoft.com/en-us/research/blog/microsoft-unveils-project-brainwave/
  2. https://cloud.google.com/blog/big-data/2017/05/an-in-depth-look-at-googles-first-tensor-processing-unit-tpu
  3. https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/

But the AI space never sleeps and we are already seeing arriving new solutions/frameworks and architectures that are actually under development:

Ray: the new distributed execution framework that aims to replace the well known Spark!

Pytorch and fast.ai framework that wants to compete and beat all the existing deep learning frameworks

To overcome one of the biggest problems on deep learning : amount of training data, snorkel a new framework has been designed to create brand new training data with little human interaction

Finally to help to create a better integration and performance of the deep learning models with the applications that want to consume those models a new prediction serving system Clipper has been designed .

The speed of AI evolution is incredible and be prepared to see much more than this in the near future!

How to create the perfect Matchmaker with Bot Framework and Cognitive Services

Hi everyone, this time I wanted to showcase some of the many capabilities of Microsoft Cognitive Services using a “cupido”   bot built with Microsoft Bot Framework .

So what is the plan? Here some ideas:

  • Leverage only Facebook as channel! Why? Because with facebook you have people already “logged in” and you can leverage the messenger profile api to retrieve automatically the user details and more importantly his facebook photo!
  • Since usually the facebook photo is an image with a face , we can use this image with Vision and Face Api to understand gender, age and bunch of other interesting info without any user interaction!
  • We can score with a custom vision model that we trained using some publicly available images if a person looks like a super model or not 😉
  • Using all this info (age, gender, makeup, sunglasses, super model or not, hair color, etc…) collected with all those calls we can decide which candidates inside our database are the right ones for our user and display the ones that are fitting according to our demo rules.

Of course at the beginning our database of profiles will be empty , but with help of friends / colleagues we can quickly fill it and have fun during the demo.

So in practice how does it look like?

Here the first interaction, after saying hello the bot immediately personalizes the experience with our facebook data (foto, name, last name,etc..) and asks if we want to participate to the experiment:

After accepting it uses the described APIs to understand the image and calculate age, hair, super model score, etc…

Yeah, I know my super model score is not really good, but let’s see if there are any matches for me anyway….

Of course the bot is smart enough to display the profile of my wife otherwise I was really in a big problem :-).

Now I guess many of you have this question: how the super model score is calculated?

Well I trained the custom vision service of Microsoft with 30+ photos of real models and 30+ photos of “normal people” and after 4 iterations I had already a 90% accuracy on detecting super models in photos 😉

Of course there are several things to consider here:

  1. Images should be the focus of the picture
  2. have sufficiently diverse images, angles, lighting, and backgrounds
  3. Train with images that are similar (in quality) to the images that will be used in scoring

And we have for sure super model pics that have larger resolution, better lighting and good exposure vs the photos of “normal” people like you and me, but for the purposes of this demo the results were very good.

Another consideration to do is that you don’t always have to use Natural Language Processing in the bots (in our case in fact we skipped the usage of LUIS ) because, especially if we are not developing a Q&A/support bot, users prefer buttons and minimal amount of info to provide.

Imagine a Bot that handles your Netflix subscription, you just want  buttons like  activate/deactivate membership (if you go in vacation) and the other is “recommendations for tonight” .

Another important thing to consider is Bot Analytics and understand how your bot is performing, I leverage this great tool that under the covers uses Azure Application Insights:

If instead you are in love with statistics you can try this jupyter notebook with the following template to analyze with your custom code the Azure Application Insights metrics and events.

If you want to try the bot already with all the telemetry setup done , you can grab , compile and try the demo code (do not use this code in any production environment) that is available here and if this is your first bot start from this tutorial to understand a bit the various pieces needed.

How to generate Terabytes of IOT data with Azure Data Lake Analytics

Hi everyone, during one of my projects I’ve been asked the following question:

I’m actually storing my IOT sensor’s data in Azure Data Lake for analysis and feature engineering , but currently I still have very few devices, so not a big amount of data and I’m not able to understand how much fast will be my queries and my transformations when with much more devices and months/years of sensor data my data lake will reach do over several terabytes.

Well in that case let’s generate quickly those terabytes of data using U-SQL capabilities!

Let’s assume that our data resembles the following:

deviceId, timestamp, sensorValue, …….

so we have for each IOT device a unique identifier called deviceId and let’s assume is a composition of numbers and letters, we have a timestamp indicating the time at millisecond precision, where the IOT event was generated and finally we have the values of the sensors in that moment (temperature, speed, etc..).

The idea is the following give a real deviceId, generate N “synthetic deviceIds” that have all the same data of the original device . So if we have , for example , 5 real deviceId each with 100.000.000 records (500.000.000 records in total), if we generate 1000 synthetic deviceIds for each real deviceId  we will have 1000x5x100.000.000 additional records so 500.000.000.000 records.

But we can expand the amount of synthetic data even more playing with time, for example, if our real data has events only for  2017, we can duplicate the entire dataset for all the years starting from 2006 to 2016 and have 5.000.000.000.000 records.

Here some sample C# code that generates the synthetic deviceIds:

note the GetArraysOfSyntheticDevices function that will be executed into the U-SQL script.

Before using it we have to register the assembly into our DataLake account and database (in my case the master one):

DROP ASSEMBLY master.[Microsoft.DataGenUtils];
CREATE ASSEMBLY master.[Microsoft.DataGenUtils] FROM @”location of dll”;

Now we can read the original IOT data and create the additional data:

REFERENCE ASSEMBLY master.[Microsoft.DataGenUtils];

@t0 =

EXTRACT
deviceid string,
timeofevent DateTime,
sensorvalue float
FROM “2017/IOTRealData.csv”
USING Extractors.Csv();

//Let’s have the distinct list of all the real DeviceIds
@t1 =SELECT DISTINCT
deviceid AS deviceid
FROM @t0;

//Let’s calculate for each deviceId an array of 1000 synthetic devices

@t2 =
SELECT deviceid,
Microsoft.DataGenUtils.SyntheticData.GetArrayOfSynteticDevices(deviceid, 1000) AS SyntheticDevices
FROM @t1;

//Let’s assign to each array of synthetic devices the same data of the corresponding real device

@t3 = SELECT a.SyntheticDevices,
de.timeofevent,
de.sensorvalue
FROM @t0 AS de INNER JOIN @t2 AS a ON de.deviceid== a.deviceid;

//Let’s use the explode function to expand the array to records

@t1Exploded =
SELECT
emp AS deviceid,
de.timeofevent,
de.sensorvalue
FROM @t3 AS de
CROSS APPLY
EXPLODE(de.SyntheticDevices) AS dp(emp);

//Now we can write the expanded data

OUTPUT @t1Exploded
TO “SyntethicData/2017/expanded_{*}.csv”
USING Outputters.Csv();

Once you have the expanded data for the entire 2017 you can just use c# DateTime functions that add Years, Months or days to a specific date, applied that to timeofevent column and write the new data in a new folder (for example SyntethicData/2016, SyntethicData/2015 etc…).