I have been working as a data specialist at a data consulting agency for two years. Though still short, I had a high-profile start. I did data analysis and data modeling for renowned international corporations as well as corporations in this country.
As gaining more valuable experience, I love more what I am doing. A pattern I find is data is more and more becoming an asset for a business. Companies have piled heaps of data in their database. How to extract data, clean data, distill actionable insights from the data asset is easy to talk than real work.
Many organisations seem to be still relying on SQL for this purpose, which is slow as there are limited amount of analysis tools available in the toolbox. Debugging is the headache that is even not worthy of mentioning.
Meanwhile, there are several lightweight and powerful off-the-shelf libraries available in data science programming languages like Python and R. Pandas is one of the analysis tool, with which I have smashed lots of complicated-looking calculation that really impressed our clients. As using it more and more nearly everyday, I get really good at it. I've been the person you resort to for whatever Pandas related questions.
One problem I found in learning to use libraries like Pandas is people learn the tools separately from tutorials. BTW, Pandas has awesome documentation that is also a good tutorial. Few are able to connect the dots. Therefore I often see among data science enthusiasts a lot of genius minds are spending time on reinventing the wheel which has been in the Pandas and is generally faster.
To close this gap, I am happy to share my knowledge of writing analysis codes that reinvent no wheels and use as many idioms in the toolbox as possible in the coming-up posts. I am also thinking making youtube video and posting the links here to show you how I do it dynamically.
Hope everyone enjoys it here at Enjoyable Data Analysis.
No comments:
Post a Comment