Placeholder Image

Subtitles section Play video

  • Hi and welcome to this data science news special!

  • As a data science professional or at least an enthusiast, you probably have Pandas in

  • your heart - Python’s primary library for data analysis and manipulation.

  • Okay.

  • What you may not have heard already is that Pandas 1.0.0 was officially released!

  • Although at first sight this latest version is not much different for the user than the

  • previous release starting with a 0: 0.25.3, there are plenty of enhanced features that

  • boost performance and lay a better foundation in the long run.

  • They represent 1.0.0 as a stable version of pandas with a strengthened API, which has

  • also been cleaned of many prior version deprecations.

  • Here are the most notable improvements that come with 1.0.0.

  • One.

  • The dedicated string and Boolean data types These features are stillexperimental”,

  • which means that further improvements are expected to happen in the near future.

  • So, as of yet, pandas will not automatically assignstringorboolto your

  • data.

  • This can only happen if you explicitly specify dtype=”stringor dtype=”boolwhile

  • creating a new structure.

  • However, in the future, this may become the default way in which pandas treats data of

  • this type.

  • Well just have to wait and see.

  • Also, you must consider the benefit of having the newstringdata type.

  • For example, until now, pandas would treat a date value and a string value asobject”.

  • Usingstringallows you to distinguish between the two, so now you can select and

  • manipulate string data much more easily.

  • Which leads us to the second point worth mentioning.

  • Two.

  • The .select_dtypes() method is much quicker now!

  • It relies on vectorization instead of iterating over a loop.

  • So, you can run .select_dtypes(“string”) to pull all string values, or .select_dtypes(“bool”)

  • to retrieve the Boolean data from a DataFrame, provided that you have set them as such beforehand.

  • Three.

  • We now can enjoy the pandas.NA scalar that denotes missing values.

  • Using pandas.NA is a new concept in the scientific ecosystem of Python, and its goal is to provide

  • an indicator for missing values that can be used consistently and successfully across

  • data types.

  • That said, this feature is currentlyexperimental”, too.

  • The reason is that it is yet to be further verified how it will intertwine with the simultaneous

  • work of other packages such as NumPy.

  • Four.

  • A method that will convert the data types of columns containing such null values has

  • been introduced – .convert_dtypes().

  • Five.

  • The well-known .info() has been improved.

  • It is much more readable and this does help you to explore your data in a quicker and

  • more efficient way.

  • Six.

  • Now we also have theto_markdown()” – this new method allows you to display a Series

  • or DataFrame object as a markdown table.

  • So overall, a lot has been done but mainly on the backend.

  • For everyday users like us, the development of clear data types, consistent with other

  • libraries is surely the most prominent improvement.

  • In any case, it is worth checking the official release notes for more information before

  • you start using 1.0.0.

  • There you can find out more about the changes related to using such features as the .sort_index()

  • or .sort_values() methods and many more.

  • Finally, note that you need at least Python 3.6.1 to use this new version.

  • If you are just starting to learn pandas, don’t forget to check the link in the description.

  • If not, ‘pip install --upgrade pandasand have fun!

Hi and welcome to this data science news special!

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it