Why Red Hat need partnerships with Cloudera, MapR, Intel and OpenDNS

partnership

Yes, I know. I´m a little crazy young man, saying to Jim Whitehurst, current Red Hat CEO and their management team all these ideas, but who knows? Perhaps, some of these ideas are not so crazy, and they could be implemented. But I will let that responsibilities to the board.

My ideas are focused on two key needs for many organizations and companies today:

  • Apache Hadoop: the de-facto platform for Big Data Analytics and its relationship with Cloud Computing
  • Internet Security: a serious problem today for companies, governments, and global organizations
  • Apache Hadoop’s Security: a very discussed topic by customers, developers and System Engineers which needs a solution right now

Continue reading “Why Red Hat need partnerships with Cloudera, MapR, Intel and OpenDNS”

Fighting Cybercrime with Splunk Security Analytics

Splunk

There are many industries which are in total explosion: Real State, Marketing Analytics, Retail, Recruiting Services, Big Data Analytics; but these are the good guys. There are other guys which are using its deep knowledge about Security, Hacking, Cracking, Phishing to take advantage of the popularity of these industries to cut a big slide of the pie and make money from that. A new kind of business have born: Crime as a Service (CaaS).

Continue reading “Fighting Cybercrime with Splunk Security Analytics”

We are in the era of Real-Time Analytics

real-time

We are in the era of Real-Time Analytics

Data can come to us in some many forms that we sometimes, feel fear about this constant growing. But, more important is what we can do with this data. It doesn’t matter if we have a vast quantity of data if we don’t know how to become this on revenue for the company, and here is when Analytics plays a key role on this. But, not just Analytics, but: Real-Time Analytics.

Continue reading “We are in the era of Real-Time Analytics”

Data Science paradigms

Hilary Mason

Don´t you know some Data Scientists? Here I let you my paradigms

There are a lot of professionals which want to become on Data Scientists (like me), but many times, they don’t know the work that current Data Scientists do on their work. I want to share with you some of the most well known Data Scientists, which love to share their knowledge with the world. Continue reading “Data Science paradigms”

My little advices for young Data Scientist

Data Science

Data Scientist is the sexy role for the next 20 years

This phrased was said by Hal Varian, Chief Financial Officer(CFO) at Google, in a interview to Mckinsey Quaterly News. Varian, who together to his team has become to Google to one of most profitable companies of the world, arriving to amazing numbers: 29,5 billions of dolars in a year.

But these numbers of the company, it would be not possible without three main roles that Varian calls: “Data Analyst”, “Statistician” and the “Data Visualization Expert”, described by the executive like the “hot and sexy jobs”. Varian says: “These professionals are and will be the key of the success of the companies in the next years, specially in these difficult times that is very hard to become to a business in a profitable piece”.So, there are many companies looking for a new professional that could combine these three skills: the “Data Scientist”. If you do a simple search in Indeed.com or Simplyhired.com searching “Data Scientist”, you can see the raising interest by the companies for this unique kind of professionals.

I want to be a Data Scientist, How I can prepare for the role?

This is a question that many young professionals (like me) have in their minds: “I want to be a Data Scientist, but How I can obtain the required knowledge for acomplish this?”. There are a lot of whitepapers, books, articles, blog sites; a lot of techniques, tools, etc. For that reason, when a new professional is faced to this insane quantity of resources, arrises a new question: What? There a lot of books, tools, How I can begin to do this? This is the main topic of this post, to help from my modest experience in this field to address to new professionals to select good and useful content. Ok, first, my books’s list:

All these amazing books helps me everyday, because they are writting for practitioners that use everyday theirs techniques and tools described in these texts. Remember, this is my personal list, you can build your list, adding more books or removing some. I let you a start point, you can decide how you should follow it. I recommend the order that I let you here, because the first book (Head First Data Analysis) do a amazing job explaining to you the tricky and challenging problems that can face a Data Analyst, in a concise and clean way, addressed for the outstanding way of its writing. (Note: All Head First’s books are incredible useful)

OK, I have the content, What about the tools?

I love Open Source, so, all the tools that I will recommend to you are developed and improved everyday under these principles:

  • Python: It’s a amazing language with a concise and clean syntax, easy to learn, easily extensible, with a lot of useful modules used by Scientists like Numpy and Maptplotlib
  • R: this amazing platform for statictical computing and data visualization has become on the “Lingua Franca for Statictians” today. The reasons are many.
  • Apache Hadoop and its ecosystem:The popular Open Source implementation of the MapReduce’s paradigm, based on a research paper by Google engineers in 2004. This project has become in one of the major trends today, with “Big Data” and “NoSQL”. Many companies are using today this amazing platform for large data sets processing (MapReduce) and distributed storage (HDFS) like Yahoo! for Social Graphs Analysis, Rackspace for Cross Data Center Log Processing, The New York Times for converting 4 TB of images of its archives to PDF files, VISA for Large Scale Transaction Analysis,eHarmony for Match Making, JP Morgan Chase for Data Processing for Finalcial Services and many more examples that you can find on the Hadoop World 2009site and on the last edition of 2011. There are many companies offering commercial versions of Apache Hadoop like Greenplum, the division of EMC with its Greenplum HD, MapR Technologies with its MR3 and MR5 editions, IBM with its BigInSights project, but for me, the leader in commercial support, training and even certifications is Cloudera, the company founded in 2009 by Amr Awadallah former, VP of Engineering for Data Systems at Yahoo!, (now is the current Cloudera CTO), Jeff Hammerbacher, former Data Scientists Team Manager at Facebook (Vice President of Products and Chief Scientist at Cloudera), Christophe Bisciglia and Mike Olson (currently the CEO of the company), former the CEO of Sleepycat, makers of BerkeleyDB, the open source embedded database engine, and then spent two years at Oracle acting like the Vice President for Embedded Technologies after Oracle’s adquisition of Sleepycat in 2006.

Final Thoughts

The rise of the Data Scientist began with Jeff, when he lead and created the Data Team at Facebook. And now in these days, every company, organization or whatever, are looking for this unique kind of professionals to do three key things, like Michael E. Driscoll (Co-founder and CEO of Metamarkets) said in his “Open Source Analytics Visualization and Predictive Modeling of Big Data with R”, in the OSCON 2009:

“We need professionals that they can able to munge, model and visualize data”

. So, It’s a great moment to develop these skills, and in that way, to be able to work in challenging problems that could solve a lot of headaches to your current or future CEO. For that reason , I let you to decide how to use this information, and if you have any comment, please, just send me an email.

Happy Hacking !!!