02 February 2020

Why I Learned Data Engineering as from a Data Scientist




Report from a true personal story


Disclaimer: The opinions in this article are restricted by the scope of my personal expriences. Please do NOT take it as the only advice for planning your future career.

The first job I got after leaving academia was data scientist. I loved the opportunity of crunching numbers as daily activities. But later I realised I must acquire experience in data engineering. Here’s my true story.

The first company I worked for is a small and fast growing consulting firm at that time. I was the only data scientist there. The company was doing well, landing contracts from renowned Australian brands. The projects mostly involve taking data resources form clients data warehouse or data mart (occasionally from source database, which is crazy) building customer views and setting up online, mainly email, marketing campaigns.

My role was supposed to spice the company’s products with artificial intelligence. In a couple of projects I developed customer clustering models. They group customers into natural clusters based on facts, including demographics, focusing primarily on transactional interactions with the brands. For instance purchase frequencies and volumes. The knowledge learned by the algorithm from data informs the clients about the patterns of behaviours in their customers and helps them tailor messages used in the campaign to different cohorts.

Another type of models useful in marketing campaigns I built is churn prediction. Knowing how likely to lose a customer gives the business advantage to offer promotions or discounts to retain the customers at risk.

All sounds interesting then what is the problem?

Statisticians always warn us by saying “Garbage in, garbage out”. The quality of the data asset is vital to data science project. Interestingly but probably not surprisingly, I found, the managing personnel’s attitude to the value of data science models resonates with the maturity of the data they own.

On some occasions the model was built and deployed into production quite smoothly. On many others, it was built but we never heard from the clients about deployment. It also happened that the data was too poor to dream about any valid model.

Similar things happened in my second job. Once the stakeholders were interested in having a model predicting online traffic volume. Sadly it never turns up on the company’s roadmap.

My story might be discouraging for those aspiring to be data scientist but it happens for good reasons. In my opinion, the profound reason is data science modelling as a new comer resides at the end of the data processing pipeline. Normally a pipeline starts from reading data from source systems, transforms it, and stores it in data warehouse to serve reporting views or data marts and maybe the machine learning models.

There are easy to imagine consequences out of this topology. Developing a working model draws least attention during planning meetings, despite people might talk about it a lot when brainstorming for a project. For many projects in this country, as I observe, a machine learning application is something good to have but not essential. Presumably, this is largely influenced by today’s decision-makers who received their education when informative dashboard reporting business performance was the universe. Take-off of the algorithm based models in business will need patience and time.

While I was struggling to prove my value as a data scientist, another role has been too busy to argue about their importance. They are the sometimes behind-the-scene heros who build the pipeline, backbone of any projects: the data engineers. Literally this happens in both of my jobs. My data engineer colleagues at the consulting firm get involved in all projects. How my second job ended is even more better an example to prove my point: I got redundant after an internal restructuring. After a few months even my boss didn’t survive the changes but the data engineer mate in my team kept his job safe before he quit for another job interesting him more.

Hopefully you find my story interesting and useful for your career consideration. I still strongly believe data scientist is the sexist job of the 21st century and I never stop acquiring knowledge for it. However, before this career fully pans out, getting data engineering skills and experiences helps you secure a job in this industry around data.

Please let me know if my story resonates in you or disagree with my opinions. Any criticism is welcomed. Leave your comments below. Peace.

No comments:

Post a Comment