A Data Science Package: Part [3]

KARTHIKEYAN S
Statistical Breakdown
5 min readJan 17, 2021

--

Photo by Dlanor S on Unsplash

Reading A Data Science Package : Part [1] and Part[2] would have given you some insights on data science as well a basic understanding of how to work using R studios. Now lets look at an interface for Python (programming language), Jupyter Notebook, that will also help you in getting a practical understanding.

Its recommended to install Python and Jupyter Notebook via Anaconda so that you can explore it along with the contents below.

Python

Python is a high-level object-oriented programming language with built in data structures. Python’s simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed.

Jupyter Notebook

The Jupyter Notebook is an incredibly powerful tool for interactively developing and presenting data science projects. A notebook integrates code and its output into a single document that combines visualizations, narrative text, mathematical equations, and other rich media. Using Notebooks is now a major part of the data science workflow at companies across the globe.

The easiest way for a beginner to get started with Jupyter Notebooks is by installing Anaconda. Anaconda is the most widely used Python distribution for data science and comes pre-loaded with all the most popular libraries and tools. The instructions provided to install this are easy to follow.

You can run Jupyter via the shortcut Anaconda adds to your start menu. Once you open it you will be presented with a dashboard which will show a list of directories. It is specifically designed for managing your Jupyter Notebooks.

It serves the purpose of exploring, editing and creating your notebooks. Jupyter starts up a local Python server to serve these apps to your web browser, making it essentially platform-independent and opening the door to easier sharing on the web.

( The dashboard can be accessed on any system via the command prompt (or terminal on Unix systems) by entering the command jupyter notebook; in this case, the current working directory will be the start-up directory. )

With Jupyter Notebook open in your browser, you may have noticed that the URL for the dashboard is something like http://localhost:8888/tree. Localhost is not a website, but indicates that the content is being served from your local machine: your own computer.

So lets get started!

Browse to the folder in which you would like to create your first notebook, click the “New” drop-down button in the top-right and select “Python 3”.

What is an ipynb File?

Each .ipynb file is a text file that describes the contents of your notebook in a format called JSON. Each cell and its contents, including image attachments that have been converted into strings of text, is listed therein along with some metadata.

The Notebook Interface

Cells and kernels are important to understanding Jupyter and to what makes it more than just a word processor.

  • A kernel is a “computational engine” that executes the code contained in a notebook document.
  • A cell is a container for text to be displayed in the notebook or code to be executed by the notebook’s kernel.

Kernels

Behind every notebook runs a kernel. When you run a code cell, that code is executed within the kernel. Any output is returned back to the cell to be displayed. The kernel’s state persists over time and between cells — it pertains to the document as a whole and not individual cells.

And if we ever wish to reset things, there are several incredibly useful options from the Kernel menu:

  • Restart: restarts the kernel, thus clearing all the variables etc that were defined.
  • Restart & Clear Output: same as above but will also wipe the output displayed below your code cells.
  • Restart & Run All: same as above but will also run all your cells in order from first to last.

If your kernel is ever stuck on a computation and you wish to stop it, you can choose the Interrupt option.

Choosing a Kernel

Jupyter gives you the option to change kernel, and in fact there are many different options to choose from.

There kernels for different versions of Python, and also for over 100 languages including Java, C, and even Fortran. Data scientists may be particularly interested in the kernels for R and Julia, as well as both imatlab and the Calysto MATLAB Kernel for Matlab.

Cells

Cells form the body of a notebook.

  • A code cell contains code to be executed in the kernel. When the code is run, the notebook displays the output below the code cell that generated it.
  • A Markdown cell contains text formatted using Markdown and displays its output in-place when the Markdown cell is run.

The first cell in a new notebook is always a code cell.

Let’s test it out with a classic hello world example: Type print('Positive vibes!') into the cell and click the run button in the toolbar above or press Ctrl + Enter. When we run the cell, its output is displayed below and the label to its left will have changed from In [ ] to In [1].

The “In” part of the label is simply short for “Input,” while the label number indicates when the cell was executed on the kernel — in this case the cell was executed first.

In a Jupyter Notebook, there is always one “active” cell highlighted with a border whose color denotes its current mode:

  • Green outline — cell is in “edit mode”
  • Blue outline — cell is in “command mode”

Keyboard Shortcuts

  • Click Esc to enter into command mode and click Enter to enter into edit mode.

Once in command mode:

  • Scroll up and down your cells with your Up and Down keys.
  • Press A or B to insert a new cell above or below the active cell.
  • M will transform the active cell to a Markdown cell.
  • Y will set the active cell to a code cell.
  • D + D (D twice) will delete the active cell.
  • Z will undo cell deletion.
  • Hold Shift and press Up or Down to select multiple cells at once. With multiple cells selected, Shift + M will merge your selection.
  • Ctrl + Shift + -, in edit mode, will split the active cell at the cursor.
  • You can also click and Shift + Click in the margin to the left of your cells to select them.

If you have followed the above contents you would have got a basic understanding of how to get started with coding with python.

Have a great day!

--

--

KARTHIKEYAN S
Statistical Breakdown

Ambitious and hardworking individual, with keen interest in data science.