I´m a long and proud Linux user since 2006, and in my geek life like a Linux user and advocate; I´ve used more than 20 different Linux distros since the days of compiling from stage 1 with Gentoo, crack a new Windows-based machine with an amazing Knoppix 3.8 LiveCD, from compiling the new version of the kernel to extract the maximum performance of a 256 MB RAM PC with a ligth and minimalist desktop environment. Then, I had the pleasure to be in charge of a complex platform where the main OS was Red Hat Enterprise Linux, and after two months working with it, I said: Wow, this is another kind of Linux ready for the enterprise.”. Then, I heard some great news: “Red Hat become in the first Open Source billion dolars company”, and I wrote a post about it. Then, I found Fedora Linux, and I’m happy with it yet. Then, I wrote about why Jim (Red Hat’s CEO) and his team should create some critical partnerships to drive Hadoop and Big Data market focused in the security of the platform. But, right now, I think that there’s an inflection point with the new release of Red Hat Enterprise Linux 7. Keep reading why I think that RHEL 7 chould change the path for Big Data and Cloud Computing markets.
I have to say it: to be in charge of the development, deployment, marketing, business development, customer success for a Polyglot platform is a constant challenge but very rewarding whe you have a happy client telling me:
“Marcos, you are doing a terrific job”, and I always answer: “You don´t have to thank me, you have thank to my team, because they are the rockstars, I just remove the obstacles and encourage them to keep doing an amazing job.”
So, I will give you some tips how to improve yourself for this kind of platforms.
Some days ago, I had the pleasure to talk with two Apache Cassandra experts. The first was Edward Capriolo, a Hadoop System Administrator at Media6Degrees, organizer of the NYC Cassandra User Group and NYC NoSQL Meetups, author of the incredible “Cassandra High Performance Cookbook” book and one of the DataStax´s MVP.
The second was the same Jonathan Ellis, DataStax’s Chief Technology Officer and co-founder, who leads Apache Cassandra’s project too.
Yes, I know. I´m a little crazy young man, saying to Jim Whitehurst, current Red Hat CEO and their management team all these ideas, but who knows? Perhaps, some of these ideas are not so crazy, and they could be implemented. But I will let that responsibilities to the board.
My ideas are focused on two key needs for many organizations and companies today:
- Apache Hadoop: the de-facto platform for Big Data Analytics and its relationship with Cloud Computing
- Internet Security: a serious problem today for companies, governments, and global organizations
- Apache Hadoop’s Security: a very discussed topic by customers, developers and System Engineers which needs a solution right now
Like I said in a post before, Crime As A Service grows everyday exponentially in every country of this world, and everyday new kind of attacks, new kind of ways to steal information come to the light.
There is an amazing battle between organizations, companies and hackers around the globe; and of course, to win this war, you have to choose wisely your tools. I blogged about Splunk Security; today it’s the turn for another big player in this field: Umbrella by OpenDNS.
Like the title says, to choose an enterprise-level Massive Parallel Processing (MPP) database is actually a big headache for every Data Science Manager; basically because there are very good choices around the tech world.
There are many industries which are in total explosion: Real State, Marketing Analytics, Retail, Recruiting Services, Big Data Analytics; but these are the good guys. There are other guys which are using its deep knowledge about Security, Hacking, Cracking, Phishing to take advantage of the popularity of these industries to cut a big slide of the pie and make money from that. A new kind of business have born: Crime as a Service (CaaS).
Many of you, my good Data Science fellows should be hearing about Real-Time since from several years before, but we are in the Era of Information, and in the years of Big Data, and changes happens so quickly that you need to adapt very fast to support the big wave of information. In Analytics, it’s happening the same thing: because if you can answer smarter questions in seconds, you will be able to react quicker to these changes and that’s really matters in these rush times, my dear friends.
I was reading yesterday a great blog post from Derrick Harris, the well known technology journalist from GigaOM where he exposed some good points about Spark, the great technology which is been developed by AMLab from the University of California, Berkeley. But it’s not just Spark, there are some good pieces of technology which are disrupting Analytics field for good. I will try to put you some of my favorite platforms in this post, but I don’t want to repeat information, so I will write just little things and amazing quotes of each platform. Let’s begin. Continue reading “Why Real-Time Analytics matters”
Feeling like a Storyteller fan
When I was a kid, I sat every afternoon, exactly to 6:00 PM a clock to watch another episode of a great serie created by Jim Hanson in 1988 called: “The Storyteller”, where an old man (John Hurt) and his funny talking dog (Brian Henson) recreated the best fabbles around the world, and I just saw and heard that with a great entertainment, that time passed quickly, and I began to count hours to see the next chapter in the next day. John’s voice was very quite and full of kindness, and every child in that time loves all stories told by the great storyteller.
You should be wondering why I began in this way. That has a single answer: I get the same feeling when I see a great infographic or a data visualization created with great tools like R in combination with ggplot2, the great matplotlib library from Python, or the amazing Tableau platform, because for me, the work of a Data Scientist is just one:
“To tell a great story behind numbers and facts with annoying graphics to say more than simple words and sofisticated statistics methods. To make it simple: You have to be a great Storyteller like John.”
But how do you do that quicker and easier? How do you build great data visualization in matters of seconds? Like I wrote before: Time, united to your mind, are the most important resources that you have, and you have to use them wisely, so I have an answer for you, my friend: Use Tableau Software. So, you should be wondering the 1 million question: WHY? Keep reading. Continue reading “Tableau Software: Rewriting the Story of Data Visualization”
We are in the era of Real-Time Analytics
Data can come to us in some many forms that we sometimes, feel fear about this constant growing. But, more important is what we can do with this data. It doesn’t matter if we have a vast quantity of data if we don’t know how to become this on revenue for the company, and here is when Analytics plays a key role on this. But, not just Analytics, but: Real-Time Analytics.