Planet Python
Last update: September 05, 2015 04:49 AM
September 04, 2015
Julien Danjou
Data validation in Python with voluptuous
Continuing my post series on the tools I use these days in Python, this time I would like to talk about a library I really like, named voluptuous.
It's no secret that most of the time, when a program receives data from the outside, it's a big deal to handle it. Indeed, most of the time your program has no guarantee that the stream is valid and that it contains what is expected.
The robustness principle says you should be liberal in what you accept, though that is not always a good idea neither. Whatever policy you chose, you need to process those data and implement a policy that will work – lax or not.
That means that the program need to look into the data received, check that it finds everything it needs, complete what might be missing (e.g. set some default), transform some data, and maybe reject those data in the end.
Data validation
The first step is to validate the data, which means checking all the fields are
there and all the types are right or understandable (parseable). Voluptuous
provides a single interface for all that called a Schema.
>>> from voluptuous import Schema
>>> s = Schema({
... 'q': str,
... 'per_page': int,
... 'page': int,
... })
>>> s({"q": "hello"})
{'q': 'hello'}
>>> s({"q": "hello", "page": "world"})
voluptuous.MultipleInvalid: expected int for dictionary value @ data['page']
>>> s({"q": "hello", "unknown": "key"})
voluptuous.MultipleInvalid: extra keys not allowed @ data['unknown']
The argument to voluptuous.Schema should be the data structure that you
expect. Voluptuous accepts any kind of data structure, so it could also be a
simple string or an array of dict of array of integer. You get it.
Here it's a dict with a few keys that if present should be validated as
certain types. By default, Voluptuous does not raise an error if some keys
are missing. However, it is invalid to have extra keys in a dict by default. If
you want to allow extra keys, it is possible to specify it.
>>> from voluptuous import Schema
>>> s = Schema({"foo": str}, extra=True)
>>> s({"bar": 2})
{"bar": 2}
It is also possible to make some keys mandatory.
>>> from voluptuous import Schema, Required
>>> s = Schema({Required("foo"): str})
>>> s({})
voluptuous.MultipleInvalid: required key not provided @ data['foo']
You can create custom data type very easily. Voluptuous data types are
actually just functions that are called with one argument, the value, and that
should either return the value or raise an Invalid or ValueError exception.
>>> from voluptuous import Schema, Invalid
>>> def StringWithLength5(value):
... if isinstance(value, str) and len(value) == 5:
... return value
... raise Invalid("Not a string with 5 chars")
...
>>> s = Schema(StringWithLength5)
>>> s("hello")
'hello'
>>> s("hello world")
voluptuous.MultipleInvalid: Not a string with 5 chars
Most of the time though, there is no need to create your own data types.
Voluptuous provides logical operators that can, combined with a few others
provided primitives such as voluptuous.Length or voluptuous.Range, create a
large range of validation scheme.
>>> from voluptuous import Schema, Length, All
>>> s = Schema(All(str, Length(min=3, max=5)))
>>> s("hello")
"hello"
>>> s("hello world")
voluptuous.MultipleInvalid: length of value must be at most 5
The voluptuous documentation has a
good set of examples that you can check to have a good overview of what you can
do.
Data transformation
What's important to remember, is that each data type that you use is a function that is called and returns a value, if the value is considered valid. That value returned is what is actually used and returned after the schema validation:
>>> import uuid
>>> from voluptuous import Schema
>>> def UUID(value):
... return uuid.UUID(value)
...
>>> s = Schema({"foo": UUID})
>>> data_converted = s({"foo": "uuid?"})
voluptuous.MultipleInvalid: not a valid value for dictionary value @ data['foo']
>>> data_converted = s({"foo": "8B7BA51C-DFF5-45DD-B28C-6911A2317D1D"})
>>> data_converted
{'foo': UUID('8b7ba51c-dff5-45dd-b28c-6911a2317d1d')}
By defining a custom UUID function that converts a value to a UUID, the
schema converts the string passed in the data to a Python UUID object –
validating the format at the same time.
Note a little trick here: it's not possible to use directly uuid.UUID in the
schema, otherwise Voluptuous would check that the data is actually an
instance of uuid.UUID:
>>> from voluptuous import Schema
>>> s = Schema({"foo": uuid.UUID})
>>> s({"foo": "8B7BA51C-DFF5-45DD-B28C-6911A2317D1D"})
voluptuous.MultipleInvalid: expected UUID for dictionary value @ data['foo']
>>> s({"foo": uuid.uuid4()})
{'foo': UUID('60b6d6c4-e719-47a7-8e2e-b4a4a30631ed')}
And that's not what is wanted here.
That mechanism is really neat to transform, for example, strings to timestamps.
>>> import datetime
>>> from voluptuous import Schema
>>> def Timestamp(value):
... return datetime.datetime.strptime(value, "%Y-%m-%dT%H:%M:%S")
...
>>> s = Schema({"foo": Timestamp})
>>> s({"foo": '2015-03-03T12:12:12'})
{'foo': datetime.datetime(2015, 3, 3, 12, 12, 12)}
>>> s({"foo": '2015-03-03T12:12'})
voluptuous.MultipleInvalid: not a valid value for dictionary value @ data['foo']
Recursive schemas
So far, Voluptuous has one limitation so far: the ability to have recursive schemas. The simplest way to circumvent it is by using another function as an indirection.
>>> from voluptuous import Schema, Any
>>> def _MySchema(value):
... return MySchema(value)
...
>>> from voluptuous import Any
>>> MySchema = Schema({"foo": Any("bar", _MySchema)})
>>> MySchema({"foo": {"foo": "bar"}})
{'foo': {'foo': 'bar'}}
>>> MySchema({"foo": {"foo": "baz"}})
voluptuous.MultipleInvalid: not a valid value for dictionary value @ data['foo']['foo']
Usage in REST API
I started to use Voluptuous to validate data in a the REST API provided by Gnocchi. So far it has been a really good tool, and we've been able to create a complete REST API that is very easy to validate on the server side. I would definitely recommend it for that. It blends with any Web framework easily.
One of the upside compared to solution like JSON Schema, is the ability to create or re-use your own custom data types while converting values at validation time. It is also very Pythonic, and extensible – it's pretty great to use for all of that. It's also not tied to any serialization format.
On the other hand, JSON Schema is language agnostic and is serializable itself as JSON. That makes it easy to be exported and provided to a consumer so it can understand the API and validate the data potentially on its side.
Python 4 Kids
Python for Kids Book: Project 5
In these posts I outline the contents of each project in my book Python For Kids For Dummies. If you have questions or comments about the project listed in the title post them here. Any improvements will also be listed here.
What’s in Project 5
Project 5 introduces functions by revisiting the Guessing Game from Project 3 and recasting it using a function. The project covers the def keyword, calling a function, the fact that a function must be defined before it can be called. The project also covers how to communicate with a function (both sending information to it by passing parameters and getting information from it, using the return keyword). In order to define a function, you need to give it a name, so the project sets out naming rules for functions. You should also be documenting your code, so the project introduces docstrings, how to create them, what to put in them and how to use them.
The project illustrates a logical problem in the code an explains what a local variable is. It introduces the concept of constants defined in the body of a program that can be accessed by code within a function. A function which conducts the game round is put inside a while loop. The user interface is changed to allow the user to exit the loop. This involves creating a quit function which first checks with the user to confirm that they want to quit, then using the break keyword to break out of the loop to quit, or the continue keyword if the user aborts the quit. The sys module is introduced in order to use sys.exit.
Improvements:
The callout on Figure 5-4 should read “The right place for an argument.” <sigh>
Python for Kids Book: Project 6
In these posts I outline the contents of each project in my book Python For Kids For Dummies. If you have questions or comments about the project listed in the title post them here. Any improvements will also be listed here.
What’s in Project 6
Project 6 introduces the concept of objects and lists to create a program that converts text into hacker speak (ie text with numbers substituted for letters). The project revisits the creation of a simple my_message variable then shows, using the dir builtin, that the variable has a variety of characteristics (that is, attributes) other than its value. It shows that one of the attributes, upper, is like a function and calls the functions of an object methods. It shows how to call a method or access an attribute through the dot notation.
In order to implement the hacker speak project, the program must perform a number of substitutions. To do that, it uses a list. The project discusses how to make a list, how to add elements to a list, how to iterate through a list and how to test whether something is in a list. It highlights a ‘gotcha’ with list methods – some of them modify the list in place without returning a value.
The code includes a logical error in the manner in which substitutions are made. A print statement is used to debug and identify the location of the error before it is fixed.
The project also gives a short introduction to IDLE’s debugger. There is a problem with IDLE’s debugger on Macs (there is no right click to set a breakpoint. Command-click works for some people, but not all), so if you’re running a Mac and command-click doesn’t work for you, you might have to skip this bit.
Python for Kids Book: Project 7
In these posts I outline the contents of each project in my book Python For Kids For Dummies. If you have questions or comments about the project listed in the title post them here. Any improvements will also be listed here.
What’s in Project 7
Project 7 (Cryptopy) introduces dictionaries as a means of encoding text using a Caesar cipher. Along the way, you are introduced to the string module and the characters in string.printable. There is some discussion about escape sequences and examples of \n and \t are given. Since you don’t want to encrypt escape sequences the slicing operator is introduced in order to take string.printable and slice off the control characters. This becomes the character set that will be encoded.
An encryption dictionary is created and each of the characters in a test message are encrypted then joined using the join method of the empty string. I explain why this is better than adding one character after another to an existing string. I create a matching decryption function and decryption dictionary and test the round trip (plaintext-> ciphertext -> plaintext).
The project introduces file operations by reading a message from a file then writing the encrypted (or decrypted) message to another file. This is first done with the base file operations open and close, then the with keyword is introduced to make the housekeeping a little easier.
I demonstrate how to use your newly written encryption functions from the command line by importing the code from your own file – your own third party module! In order for this to work seamlessly you are introduced to the __name__ attribute and the if __name__ == “__main__”: construction.
Twisted Matrix Labs
Twisted 15.4 Released
On behalf of Twisted Matrix Laboratories, I am honoured to announce the release of Twisted 15.4, codenamed "Trial By Fire".
Twisted is continuing to forge ahead, with the highlights of this release being:
- twisted.positioning, the rest of twisted.internet.endpoints, KQueueReactor (for real this time), and twisted.web.proxy/guard have all been ported to Python 3.
- Trial has been ported to Python 3! This was made possible by a Python Software Foundation grant.
- Twisted officially supports several more platforms: Py3.4 on FreeBSD, Py2.7/Py3.4 on Fedora 21/22, and Py2.7 on RHEL7.
- Python 2.6 is no longer supported, and support for Debian 6, and RHEL6 has been removed because of this.
- Support for the EOL'd platforms of Fedora 17/18/19 has been removed.
- Twisted has moved to requiring setuptools for installation.
- twisted.python.failure.Failure's __repr__ now includes the exception message.
- 19 tickets in total closed!
Many thanks to everyone who had a part in this release - the supporters of the Twisted Software Foundation, the developers who contributed code as well as documentation, and all the people building great things with Twisted!
Twisted Regards,
Amber "Hawkie" Brown
September 03, 2015
Import Python
ImportPython JobBoard

Are you looking for a Python Developer OR a developer looking for a Python Programming Job. Check out the free Python JobBoard .
Job Posting are featured in the newsletter, letting you reach out to 9000+ Python Developers globally.
Nikola
Nikola v7.7.0 is out!
On behalf of the Nikola team, I am pleased to announce the immediate availability of Nikola v7.7.0. It fixes some bugs and adds new features.
What is Nikola?
Nikola is a static site and blog generator, written in Python. It can use Mako and Jinja2 templates, and input in many popular markup formats, such as reStructuredText and Markdown — and can even turn Jupyter (IPython) Notebooks into blog posts! It also supports image galleries, and is multilingual. Nikola is flexible, and page builds are extremely fast, courtesy of doit (which is rebuilding only what has been changed).
Find out more at the website: https://getnikola.com/
Key Changes since v7.6.4
- Sections by Daniel Aleksandersen
- Author pages by Juanjo Conti
Downloads
Install using pip install Nikola or download tarballs on GitHub and PyPI.
Changes
Features
- New support for online CSS and JS minifying services (Issue #1999)
- Make tag optional with USE_BASE_TAG flag (Issue #1985)
- Render author pages (Issue #1972)
- Atom feeds for tag lists (Issue #1686)
- New
THEME_COLORoption for customizing themes from a primary color (Issue #1980) - New
color_hsl_adjust_hexandcolorize_str_from_base_colorfunctions available in themes (Issue #1980) - New
POSTSoutput subfolders now generate sections by deault (Issue #1980) - New
POSTS_SECTIONSandPOSTS_SECTION_*options for configuring the section pages (Issue #1980) - For themers: Each
postare now asssociated with section_color, section_link, and section_name (Issue #1980) - Each new section page has a auto-assigned color based on shifting
the hue of
THEME_COLORbased on a hash of the section name, can be overwritten withPOSTS_SECTION_COLORSoption (Issue #1980) - New
TAG_PAGES_TITLESandCATEGORY_PAGES_TITLESoptions (Issue #1962) - Add Bosnian and Serbian (Latin) languages, by Saša Savić [bs, sr_latin]
- Add Portuguese (Portugal) language, by jamatos [pt]
Bugfixes
- Make nikola tabcompletion work outside sites (Issue #1983)
- Fix display of categories list in bootstrap theme (Issue #2002)
- If webassets is not installed, use unbundled assets (Issue #1992)
- Check links in Atom and sitemap files (Issue #1993)
- Link checker should check all absolute URLs to self (Issue #1991)
- Check
img|source[@srcset]as part ofcheck -l(Issue #1989) - Clean up translations for third party components
pagekind["main_index"]set on the main indexes to differentiate them from all the other indexes.- Add dependency on metadata file for 2-file posts (Issue #1968)
- Set UTF-8 charset in Content-Type or text/ and +xml (Issue #1966)
New feature in Nikola: Sections
This post is reproduced with permission from the author. See it in the original site
Sections are like newspaper sections that let you group related content together in a collection. Every post from a section appear under a common name, folder/address, and optionally use distinct styling. They also have their own landing pages containing an index with all their posts and their own syndication feed. With sections and post collections, you can diversify your Nikola blog by writing on different topics all on the same website. Readers who are only interested in one subsection of the content you publish can subscribe to the feed of the section or sections that interest them the most.
In Nikola, sections are normally built automatically based on the output folders specified in the POSTS option. Each output folder is a new section. The index pages and feeds for each section will be output in the same directory as the posts. Alternatively, sections can be assigned using a section property in each post’s metadata. Note that this will not change the output folder or address of a post and thus lose some of the uniformity you get with having posts include their section name as part of their address.
The following configuration example demonstrates how three sections on different topics are created. The first argument is the source path to where the posts are stored, the second argument is the output folder and section name, and the third argument is the template to use for each section. Posts can use the same template, but you may want to customize the template for each section with bigger hero images on your food section and special star rating systems and different HTML markup for your reviews.
POSTS = {
('posts/blog/*.md', blog', 'post.tmpl'),
('posts/food/*.md', 'food', 'post_recipe.tmpl'),
('posts/review/*.md', 'review, 'post_reviews.tmpl'),
} Posts cannot be added to multiple sections as this might create duplicate pages with different addresses. Duplicate pages is something you will want to avoid in most cases. If you really want a post to appear in multiple sections, you’re looking for Nikola’s tags or categories functionality.
Some customizations I’ve made to my own templates after reorganizing to use sections:
- Display the name and color of the section a post belongs to on the front page.
- Display a link to syndication feed for each section as well as the everything-feed at the top of each section and post belonging to that section.
- Breadcrumb navigations from posts to their sections and from the sections to the front page. Encourages visitors to your site to find more content from the same section.
Additionally, each section and every post in that section will be automatically assigned a color created by shifting the hue of the site’s THEME_COLOR option in the HUSL color space. This creates a visually similar color that can be used to style the posts and the sections in a uniform way, allowing each section to have a unique style of their own. The color can be called from a theme using post.section_color() and can be used in an inline styles or a style element. The color manipulation functions can also be accessed directly in theme templates, allowing for shifting hue, saturation, or lightness of a given color. For example, a lighter version of a section’s color can be retrieved using color_hsl_adjust_hex( post.section_color(), adjust_l=0.05 ).
The options for controlling the behavior of sections are better documented in conf.py and include:
-
POSTS_SECTIONSfor enabling or disabling sections (on by default) -
POSTS_SECTION_ARE_INDEXESfor making posts lists instead of indexes -
POSTS_SECTION_COLORfor manually assigning colors to sections rather than auto-generated colors fromTHEME_COLOR -
POSTS_SECTION_NAMEfor naming sections separately from their output folders -
POSTS_SECTION_TITLEfor controlling the title of the section indexes -
POSTS_SECTION_DESCRIPTIONfor giving each section a description
There is currently no way of generating a list of all sections. A site is not expected to need more than three–twelve sections at the most.
Sections will be available in Nikola version 7.7.0 due later this week.
Drop by the Nikola mailing list or chat roomif you’ve built something cool with sections or just need a little help.
What on Earth is “Nikola,” anyway?
Nikola is a static site generator built in Python. What that means, is that it can turn a collection of text files into a beautiful website using templates and a collection of ready-made themes. This website, even this very page!, was built using Nikola. Learn more at the Nikola website.
I’ve contributed to the development of Nikola for the last two years — the new sectioning system only in the last week — and I’m really happy with how Nikola works, the community, and especially how it has helped me build a great website that I’m really proud of.
Reinout van Rees
Automation for better behaviour
Now... that's a provocative title! In a sense, it is intended that way. Some behaviour is better than other behaviour. A value judgment! In the Netherlands, where I live, value judgments are suspect. If you have a comment on someone's behaviour, a common question is whether you're "better" than them. If you have a judgment, you apparently automatically think you've got a higher social status or a higher moral standing. And that's bad, apparently.
Well, I despise such thinking :-)
Absolute values
I think there are absolutes you can refer to, that you can compare to. Lofty goals you can try to accomplish. Obvious truths (which can theoretically be wrong...) that are recognized by many.
Nihilism is fine, btw. If you're a pure nihilist: I can respect that. It is an internally-logical viewpoint. Only you shouldn't complain if some other nihilist cuts you to pieces if that suits his purely-individual nihilistic purposes.
So for practical purposes I'm going to assume there's some higher goal/law/purpose/whatever that we should attain.
Take programming in python. PEP 8, python's official style guide is recognized by most of the python programmers as the style guide they should adhere to. At least, nobody in my company complained if I adjusted/fixed their code to comply with PEP 8. And the addition of bin/pep8 in all of our software projects to make it easy to check for compliance didn't raise any protests. Pyflakes is even clearer, as it often points at real errors of obvious omissions.
For django projects, possible good things include:
- Sentry integration for nicely-accessible error logging.
- Using a recent and supported django version. So those 1.4 instances we still have at my workplace should go the way of the dodo.
- Using proper releases instead of using the latest master git checkout.
- Using migrations.
- Tests.
Automation is central to good behaviour
My take on good behaviour is that you should either make it easy to do the good thing or you should make non-good behaviour visible.
As an example, take python releases. As a manager you can say "thou shalt make good releases". Oh wow. An impressive display of power. It reminds me of a certain SF comic where, to teach them a lesson, an entire political assembly was threathened with obliteration from orbit. Needless to say, the strong words didn't have a measurable effect.
You can say the same words at a programmer meeting, of course. "Let's agree to make proper releases". Yes. Right.
What do you have to do for a proper release?
- Adjust the version in setup.py from 1.2.dev.0 to 1.2.
- Record the release date in the changelog.
- Tag the release.
- Update the version number in setup.py to 1.3.dev.0.
- Add a new header for 1.3 in the changelog.
Now... That's quite an amount of work. If I'm honest, I trust about 40% of my colleagues to make that effort every time they release a package.
There is a better way. Those very same colleagues can be relied on to make perfect releases all the time if all they have to do is to call bin/fullrelease and press ENTER a few times to do all of the above automatically. Thanks to zest.releaser.
Zest.releaser makes it easier and quicker to make good releases than it is to make bad/quick/sloppy releases by hand.
Further examples
Now... here are some further examples to get you thinking.
All of our projects are started with "nensskel", a tool to create a skeleton for a new project (python lib, django app, django site). It uses "paste script"; many people now use "cookie cutter", which serves the same purpose.
For all projects, a proper test setup is included. You can always run bin/test and your test case will run. You only have to fill it in.
bin/fullrelease, bin/pep8, bin/pyflakes: if you haven't yet installed those programs globally, how easy can I make it for you to use them???
If you want to add documentation, sphinx is all set up for you. The docs/source/ directory is there and sphinx is automatically run every time you run buildout.
The README.rst has some easy do-this-do-that comments in there for when you've just started your project. Simple quick things like "add your name in the setup.py author field". And "add a one-line summary to the setup.py and add that same one to the github.com description".
I cannot make it much easier, right?
Now... quite some projects still have this TODO list in their README.
Conclusion: you need automation to enable policy
You need automation to enable policy, but even that isn't enough. I cannot possibly automatically write a one-line summary for a just-generated project. So I have to make do with a TODO note in the README and in the setup.py. Which gets disregarded.
If even such simple things get disregarded, bigger things like "add a test" and "provide documentation" and "make sure there is a proper release script" will be hard to get right. I must admit to not always adding tests for functionality.
I'll hereby torture myself with a quote. "Unit testing is for programmers what washing your hands is for doctors before an operation". It is an essential part of your profession. If you go to the hospital, you don't expect to have to ask your doctor to disinfect the hands before the operation. That's expected. Likewise, you shouldn't expect your clients to explicitly ask you for software tests: those should be there by default!
Again, I admit to not always adding tests. That's bad. As a professional software developer I should make sure that at least 90% test coverage is considered normal at my company. In the cases where we measure it, coverage is probably around 50%. Which means "bad". Which also means "you're not measuring it all the time". 90% should also be normal for my own code and I also don't always attain that.
Our company-wide policy should be to get our test coverage at least to 90%. Whether or not if that's our policy, we'll never make 90% if we don't measure it.
And that is the point I want to make. You need tools. You need automation. If you don't measure your test coverage, any developer or management policy statement will be effectively meaningless. If you have a jenkins instance that's in serious neglect (70% of the projects are red), you don't effectively have meaningful tests. Without a functioning jenkins instance (or travis-ci.org), you cannot properly say you're delivering quality software.
Without tooling and automation to prove your policy, your policy statements are effectively worthless. And that's quite a strong value statement :-)
Anatoly Techtonik
SCons build targets
def dump_targets(targets):
for t in targets:
if type(t) == str:
name = t
else:
name = t.name
print(" <" + str(t.__class__.__name__) + "> " + name)
print("[*] Default targets:")
dump_targets(DEFAULT_TARGETS)
print("[*] Command line targets:")
dump_targets(COMMAND_LINE_TARGETS)print("[*] All build targets:")
dump_targets(BUILD_TARGETS)
[*] Default targets:
<Alias> wesnoth
<Alias> wesnothd
[*] Command line targets:
<str> .
[*] All build targets:
<str> .
scons
scons .
scons /
scons C:\ D:\
scons foo bar
scons -c .
scons -c build export
scons src/subdir
cd src/subdir
scons -u .
Jorgen Schäfer
Elpy 1.9.0 released
I just released version 1.9.0 of Elpy, the Emacs Python Development Environment. This is a feature release.
Elpy is an Emacs package to bring powerful Python editing to Emacs. It combines a number of other packages, both written in Emacs Lisp as well as Python.
Quick Installation
Evaluate this:
(require 'package)
(add-to-list 'package-archives
'("elpy" .
"https://jorgenschaefer.github.io/packages/"))Then run M-x package-install RET elpy RET.
Finally, run the following (and add them to your .emacs):
(package-initialize)
(elpy-enable)Changes in 1.9.0
- Elpy now supports the
autopep8library for automatically formatting Python code. All refactoring-related code is now grouped underC-c C-r. UseC-c C-r ito fix up imports using importmagic,C-c C-r pto fix up Python code with autopep8, andC-c C-r rto bring up the old Rope refactoring menu. C-c C-bwill now select a region containing surrounding lines of the current indentation or more.C-c C-zin a Python shell will now switch back to the last Python buffer, allowing to use the key to cycle back and forth between the Python buffer and shell.- The pattern used for
C-c C-sis now customizeable inelpy-rgrep-file-pattern. <C-return>now can be used to send the current statement to the Python shell. Be careful, this can break with nested statements.- The Elpy minor mode now also works in modes derived from
python-mode, not just in the mode itself.
Thanks to ChillarAnand, raylu and Chedi for their contributions!
Codementor
Data Science with Python & R: Data Frames I

Motivation
These series of tutorials on Data Science will try to compare how different concepts in the discipline can be implemented in the two dominant ecosystems nowadays: R and Python. We will do this from a neutral point of view. Our opinion is that each environment has good and bad things, and any data scientist should know how to use both in order to be as prepared as possible for the job market or to start a personal project.
To get a feeling of what is going on regarding this hot topic, we refer the reader to DataCamp’s Data Science War infographic. Their infographic explores what the strengths of R are over Python and vice versa, and aims to provide a basic comparison between these two programming languages from a data science and statistics perspective.
Far from being a repetition from the previous, our series of tutorials will go hands-on into how to actually perform different data science tasks such as working with data frames, doing aggregations, or creating different statistical models such in the areas of supervised and unsupervised learning.
As usual, we will use real-world datasets. This will help us to quickly transfer what we learn here to actual data analysis situations.
The first tutorial in our series will deal the an important abstraction, that of a Data Frame. In the very next tutorial, we will introduce one of the first tasks we face when we have our data loaded, that of the Exploratory Data Analysis. This task can be performed using data frames and basic plots as we will show here for both, Python and R.
All the source code for the different parts of this series of tutorials and applications can be checked at GitHub. Feel free to get involved and share your progress with us!
What is a DataFrame?
A data frame is used for storing tabular data. It has labeled axes (rows and columns) that we can use to perform arithmetic operations at on levels.
The concept was introduced in R before it was in Python Pandas so the later repeats many of the ideas from the former. In R, a data.frame is a list of vector variables of the same number of elements (rows) with unique row names. That is, each column is a vector with an associated name, and each row is a series of vector elements that correspond to the same position in each of the column-vectors.
In Pandas, a DataFrame can be thought of as a dict-like container for Series objects, where a Series is a one-dimensional NumPy ndarray with axis labels (including time series). By default, each Series correspond with a column in the resulting DataFrame.
But let’s see both data types in practice. First of all we will introduce a data set that will be used in order to explain the data frame creation process and what data analysis tasks can be done with a data frame. Then we will have a separate section for each platform repeating every task for you to be able to move from one to the other easily in the future.
Introducing Gapminder World datasets
The Gapminder website presents itself as a fact-based worldview. It is a comprehensive resource for data regarding different countries and territories indicators. Its Data section contains a list of datasets that can be accessed as Google Spreadsheet pages (add &output=csv to download as CSV). Each indicator dataset is tagged with a Data provider, a Category, and a Subcategory.
For this tutorial, we will use different datasets related to Infectious Tuberculosis:
- All TB deaths per 100K
- TB estimated prevalence (existing cases) per 100K
- TB estimated incidence (new cases) per 100K
First thing we need to do is download the files for later use within our R and Python environments. There is a description of each dataset if we click in its title in the list of datasets. When performing any data analysis task, it is essential to understand our data as much as possible, so go there and have a read. Basically, each cell in the dataset contains the data related to the number of tuberculosis cases per 100K people during the given year (column) for each country or region (row).
We will use these datasets to better understand the TB incidence in different regions in time.
Downloading files and reading CSV
Python
Download Google Spreadsheet data as CSV.
import urllib
tb_deaths_url_csv = 'https://docs.google.com/spreadsheets/d/12uWVH_IlmzJX_75bJ3IH5E-Gqx6-zfbDKNvZqYjUuso/pub?gid=0&output=CSV'
tb_existing_url_csv = 'https://docs.google.com/spreadsheets/d/1X5Jp7Q8pTs3KLJ5JBWKhncVACGsg5v4xu6badNs4C7I/pub?gid=0&output=csv'
tb_new_url_csv = 'https://docs.google.com/spreadsheets/d/1Pl51PcEGlO9Hp4Uh0x2_QM0xVb53p2UDBMPwcnSjFTk/pub?gid=0&output=csv'
local_tb_deaths_file = 'tb_deaths_100.csv'
local_tb_existing_file = 'tb_existing_100.csv'
local_tb_new_file = 'tb_new_100.csv'
deaths_f = urllib.urlretrieve(tb_deaths_url_csv, local_tb_deaths_file)
existing_f = urllib.urlretrieve(tb_existing_url_csv, local_tb_existing_file)
new_f = urllib.urlretrieve(tb_new_url_csv, local_tb_new_file)
Read CSV into DataFrame by using read_csv().
import pandas as pd
deaths_df = pd.read_csv(local_tb_deaths_file, index_col = 0, thousands = ',').T
existing_df = pd.read_csv(local_tb_existing_file, index_col = 0, thousands = ',').T
new_df = pd.read_csv(local_tb_new_file, index_col = 0, thousands = ',').T
We have specified index_col to be 0 since we want the country names to be the row labels. We also specified the thousands separator to be ‘,’ so Pandas automatically parses cells as numbers. Then, we transpose the table to make the time series for each country correspond to each column.
We will concentrate on the existing cases for a while. We can use head() to check the first few lines.
existing_df.head()
| TB prevalence, all forms (per 100 000 population per year) | Afghanistan | Albania | Algeria | American Samoa | Andorra | Angola | Anguilla | Antigua and Barbuda | Argentina | Armenia | … | Uruguay | Uzbekistan | Vanuatu | Venezuela | Viet Nam | Wallis et Futuna | West Bank and Gaza | Yemen | Zambia | Zimbabwe |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1990 | 436 | 42 | 45 | 42 | 39 | 514 | 38 | 16 | 96 | 52 | … | 35 | 114 | 278 | 46 | 365 | 126 | 55 | 265 | 436 | 409 |
| 1991 | 429 | 40 | 44 | 14 | 37 | 514 | 38 | 15 | 91 | 49 | … | 34 | 105 | 268 | 45 | 361 | 352 | 54 | 261 | 456 | 417 |
| 1992 | 422 | 41 | 44 | 4 | 35 | 513 | 37 | 15 | 86 | 51 | … | 33 | 102 | 259 | 44 | 358 | 64 | 54 | 263 | 494 | 415 |
| 1993 | 415 | 42 | 43 | 18 | 33 | 512 | 37 | 14 | 82 | 55 | … | 32 | 118 | 250 | 43 | 354 | 174 | 52 | 253 | 526 | 419 |
| 1994 | 407 | 42 | 43 | 17 | 32 | 510 | 36 | 13 | 78 | 60 | … | 31 | 116 | 242 | 42 | 350 | 172 | 52 | 250 | 556 | 426 |
5 rows × 207 columns
By using the attribute columns we can read and write column names.
existing_df.columns
Index([u'Afghanistan', u'Albania', u'Algeria', u'American Samoa', u'Andorra', u'Angola', u'Anguilla', u'Antigua and Barbuda', u'Argentina', u'Armenia', u'Australia', u'Austria', u'Azerbaijan', u'Bahamas', u'Bahrain', u'Bangladesh', u'Barbados', u'Belarus', u'Belgium', u'Belize', u'Benin', u'Bermuda', u'Bhutan', u'Bolivia', u'Bosnia and Herzegovina', u'Botswana', u'Brazil', u'British Virgin Islands', u'Brunei Darussalam', u'Bulgaria', u'Burkina Faso', u'Burundi', u'Cambodia', u'Cameroon', u'Canada', u'Cape Verde', u'Cayman Islands', u'Central African Republic', u'Chad', u'Chile', u'China', u'Colombia', u'Comoros', u'Congo, Rep.', u'Cook Islands', u'Costa Rica', u'Croatia', u'Cuba', u'Cyprus', u'Czech Republic', u'Cote dIvoire', u'Korea, Dem. Rep.', u'Congo, Dem. Rep.', u'Denmark', u'Djibouti', u'Dominica', u'Dominican Republic', u'Ecuador', u'Egypt', u'El Salvador', u'Equatorial Guinea', u'Eritrea', u'Estonia', u'Ethiopia', u'Fiji', u'Finland', u'France', u'French Polynesia', u'Gabon', u'Gambia', u'Georgia', u'Germany', u'Ghana', u'Greece', u'Grenada', u'Guam', u'Guatemala', u'Guinea', u'Guinea-Bissau', u'Guyana', u'Haiti', u'Honduras', u'Hungary', u'Iceland', u'India', u'Indonesia', u'Iran', u'Iraq', u'Ireland', u'Israel', u'Italy', u'Jamaica', u'Japan', u'Jordan', u'Kazakhstan', u'Kenya', u'Kiribati', u'Kuwait', u'Kyrgyzstan', u'Laos', ...], dtype='object')
Similarly, we can access row names by using index.
existing_df.index
Index([u'1990', u'1991', u'1992', u'1993', u'1994', u'1995', u'1996', u'1997', u'1998', u'1999', u'2000', u'2001', u'2002', u'2003', u'2004', u'2005', u'2006', u'2007'], dtype='object')
We will use them to assign proper names to our column and index names.
deaths_df.index.names = ['year']
deaths_df.columns.names = ['country']
existing_df.index.names = ['year']
existing_df.columns.names = ['country']
new_df.index.names = ['year']
new_df.columns.names = ['country']
existing_df
| country | Afghanistan | Albania | Algeria | American Samoa | Andorra | Angola | Anguilla | Antigua and Barbuda | Argentina | Armenia | … | Uruguay | Uzbekistan | Vanuatu | Venezuela | Viet Nam | Wallis et Futuna | West Bank and Gaza | Yemen | Zambia | Zimbabwe |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| year | |||||||||||||||||||||
| 1990 | 436 | 42 | 45 | 42 | 39 | 514 | 38 | 16 | 96 | 52 | … | 35 | 114 | 278 | 46 | 365 | 126 | 55 | 265 | 436 | 409 |
| 1991 | 429 | 40 | 44 | 14 | 37 | 514 | 38 | 15 | 91 | 49 | … | 34 | 105 | 268 | 45 | 361 | 352 | 54 | 261 | 456 | 417 |
| 1992 | 422 | 41 | 44 | 4 | 35 | 513 | 37 | 15 | 86 | 51 | … | 33 | 102 | 259 | 44 | 358 | 64 | 54 | 263 | 494 | 415 |
| 1993 | 415 | 42 | 43 | 18 | 33 | 512 | 37 | 14 | 82 | 55 | … | 32 | 118 | 250 | 43 | 354 | 174 | 52 | 253 | 526 | 419 |
| 1994 | 407 | 42 | 43 | 17 | 32 | 510 | 36 | 13 | 78 | 60 | … | 31 | 116 | 242 | 42 | 350 | 172 | 52 | 250 | 556 | 426 |
| 1995 | 397 | 43 | 42 | 22 | 30 | 508 | 35 | 12 | 74 | 68 | … | 30 | 119 | 234 | 42 | 346 | 93 | 50 | 244 | 585 | 439 |
| 1996 | 397 | 42 | 43 | 0 | 28 | 512 | 35 | 12 | 71 | 74 | … | 28 | 111 | 226 | 41 | 312 | 123 | 49 | 233 | 602 | 453 |
| 1997 | 387 | 44 | 44 | 25 | 23 | 363 | 36 | 11 | 67 | 75 | … | 27 | 122 | 218 | 41 | 273 | 213 | 46 | 207 | 626 | 481 |
| 1998 | 374 | 43 | 45 | 12 | 24 | 414 | 36 | 11 | 63 | 74 | … | 28 | 129 | 211 | 40 | 261 | 107 | 44 | 194 | 634 | 392 |
| 1999 | 373 | 42 | 46 | 8 | 22 | 384 | 36 | 9 | 58 | 86 | … | 28 | 134 | 159 | 39 | 253 | 105 | 42 | 175 | 657 | 430 |
| 2000 | 346 | 40 | 48 | 8 | 20 | 530 | 35 | 8 | 52 | 94 | … | 27 | 139 | 143 | 39 | 248 | 103 | 40 | 164 | 658 | 479 |
| 2001 | 326 | 34 | 49 | 6 | 20 | 335 | 35 | 9 | 51 | 99 | … | 25 | 148 | 128 | 41 | 243 | 13 | 39 | 154 | 680 | 523 |
| 2002 | 304 | 32 | 50 | 5 | 21 | 307 | 35 | 7 | 42 | 97 | … | 27 | 144 | 149 | 41 | 235 | 275 | 37 | 149 | 517 | 571 |
| 2003 | 308 | 32 | 51 | 6 | 18 | 281 | 35 | 9 | 41 | 91 | … | 25 | 152 | 128 | 39 | 234 | 147 | 36 | 146 | 478 | 632 |
| 2004 | 283 | 29 | 52 | 9 | 19 | 318 | 35 | 8 | 39 | 85 | … | 23 | 149 | 118 | 38 | 226 | 63 | 35 | 138 | 468 | 652 |
| 2005 | 267 | 29 | 53 | 11 | 18 | 331 | 34 | 8 | 39 | 79 | … | 24 | 144 | 131 | 38 | 227 | 57 | 33 | 137 | 453 | 680 |
| 2006 | 251 | 26 | 55 | 9 | 17 | 302 | 34 | 9 | 37 | 79 | … | 25 | 134 | 104 | 38 | 222 | 60 | 32 | 135 | 422 | 699 |
| 2007 | 238 | 22 | 56 | 5 | 19 | 294 | 34 | 9 | 35 | 81 | … | 23 | 140 | 102 | 39 | 220 | 25 | 31 | 130 | 387 | 714 |
R
In R we use read.csv to read CSV files into data.frame variables. Although the R function read.csv can work with URLs, https is a problem for R in many cases, so you need to use a package like RCurl to get around it.
library(RCurl)
## Loading required package: bitops
existing_cases_file <- getURL("https://docs.google.com/spreadsheets/d/1X5Jp7Q8pTs3KLJ5JBWKhncVACGsg5v4xu6badNs4C7I/pub?gid=0&output=csv")
existing_df <- read.csv(text = existing_cases_file, row.names=1, stringsAsFactor=F)
str(existing_df)
## 'data.frame': 207 obs. of 18 variables:
## $ X1990: chr "436" "42" "45" "42" ...
## $ X1991: chr "429" "40" "44" "14" ...
## $ X1992: chr "422" "41" "44" "4" ...
## $ X1993: chr "415" "42" "43" "18" ...
## $ X1994: chr "407" "42" "43" "17" ...
## $ X1995: chr "397" "43" "42" "22" ...
## $ X1996: int 397 42 43 0 28 512 35 12 71 74 ...
## $ X1997: int 387 44 44 25 23 363 36 11 67 75 ...
## $ X1998: int 374 43 45 12 24 414 36 11 63 74 ...
## $ X1999: int 373 42 46 8 22 384 36 9 58 86 ...
## $ X2000: int 346 40 48 8 20 530 35 8 52 94 ...
## $ X2001: int 326 34 49 6 20 335 35 9 51 99 ...
## $ X2002: int 304 32 50 5 21 307 35 7 42 97 ...
## $ X2003: int 308 32 51 6 18 281 35 9 41 91 ...
## $ X2004: chr "283" "29" "52" "9" ...
## $ X2005: chr "267" "29" "53" "11" ...
## $ X2006: chr "251" "26" "55" "9" ...
## $ X2007: chr "238" "22" "56" "5" ...
The str() function in R gives us information about a variable type. In this case
we can see that, due to the , thousands separator,
some of the columns hasn’t been parsed as numbers but as character.
If we want to properly work with our dataset we need to convert them to numbers.
Once we know a bit more about indexing and mapping functions, I promise you will be
able to understand the following piece of code. By know let’s say that we convert
a column and assign it again to its reference in the data frame.
existing_df[c(1,2,3,4,5,6,15,16,17,18)] <-
lapply( existing_df[c(1,2,3,4,5,6,15,16,17,18)],
function(x) { as.integer(gsub(',', '', x) )})
str(existing_df)
## 'data.frame': 207 obs. of 18 variables:
## $ X1990: int 436 42 45 42 39 514 38 16 96 52 ...
## $ X1991: int 429 40 44 14 37 514 38 15 91 49 ...
## $ X1992: int 422 41 44 4 35 513 37 15 86 51 ...
## $ X1993: int 415 42 43 18 33 512 37 14 82 55 ...
## $ X1994: int 407 42 43 17 32 510 36 13 78 60 ...
## $ X1995: int 397 43 42 22 30 508 35 12 74 68 ...
## $ X1996: int 397 42 43 0 28 512 35 12 71 74 ...
## $ X1997: int 387 44 44 25 23 363 36 11 67 75 ...
## $ X1998: int 374 43 45 12 24 414 36 11 63 74 ...
## $ X1999: int 373 42 46 8 22 384 36 9 58 86 ...
## $ X2000: int 346 40 48 8 20 530 35 8 52 94 ...
## $ X2001: int 326 34 49 6 20 335 35 9 51 99 ...
## $ X2002: int 304 32 50 5 21 307 35 7 42 97 ...
## $ X2003: int 308 32 51 6 18 281 35 9 41 91 ...
## $ X2004: int 283 29 52 9 19 318 35 8 39 85 ...
## $ X2005: int 267 29 53 11 18 331 34 8 39 79 ...
## $ X2006: int 251 26 55 9 17 302 34 9 37 79 ...
## $ X2007: int 238 22 56 5 19 294 34 9 35 81 ...
Everything looks fine now. But still our dataset is a bit tricky. If we have a
look at what we got into the data frame with head
head(existing_df,3)
## X1990 X1991 X1992 X1993 X1994 X1995 X1996 X1997 X1998 X1999
## Afghanistan 436 429 422 415 407 397 397 387 374 373
## Albania 42 40 41 42 42 43 42 44 43 42
## Algeria 45 44 44 43 43 42 43 44 45 46
## X2000 X2001 X2002 X2003 X2004 X2005 X2006 X2007
## Afghanistan 346 326 304 308 283 267 251 238
## Albania 40 34 32 32 29 29 26 22
## Algeria 48 49 50 51 52 53 55 56
and nrow and ncol
nrow(existing_df)
## [1] 207
ncol(existing_df)
## [1] 18
we see that we have a data frame with 207 observations, one for each country, and 19 variables or features, one for each year. This doesn’t seem the most natural shape for this dataset. It is very unlikely that we will add new countries (observations or rows in this case) to the dataset, while is quite possible to add additional years (variables or columns in this case). If we keep it like it is, we will end up with a dataset that grows in features and not in observations, and that seems counterintuitive (and unpractical depending of the analysis we will want to do).
We won’t need to do this preprocessing all the time, but there we go. Thankfully, R as a function t() similar to the method T in Pandas, that allows us to transpose a data.frame variable. The result is given as a matrix, so we need to convert it to a data frame again by using as.data.frame.
# we will save the "trasposed" original verison for later use if needed
existing_df_t <- existing_df
existing_df <- as.data.frame(t(existing_df))
head(existing_df,3)
## Afghanistan Albania Algeria American Samoa Andorra Angola Anguilla
## X1990 436 42 45 42 39 514 38
## X1991 429 40 44 14 37 514 38
## X1992 422 41 44 4 35 513 37
## Antigua and Barbuda Argentina Armenia Australia Austria Azerbaijan
## X1990 16 96 52 7 18 58
## X1991 15 91 49 7 17 55
## X1992 15 86 51 7 16 57
## Bahamas Bahrain Bangladesh Barbados Belarus Belgium Belize Benin
## X1990 54 120 639 8 62 16 65 140
## X1991 53 113 623 8 54 15 64 138
## X1992 52 108 608 7 59 15 62 135
## Bermuda Bhutan Bolivia Bosnia and Herzegovina Botswana Brazil
## X1990 10 924 377 160 344 124
## X1991 10 862 362 156 355 119
## X1992 9 804 347 154 351 114
## British Virgin Islands Brunei Darussalam Bulgaria Burkina Faso
## X1990 32 91 43 179
## X1991 30 91 48 196
## X1992 28 91 54 208
## Burundi Cambodia Cameroon Canada Cape Verde Cayman Islands
## X1990 288 928 188 7 449 10
## X1991 302 905 199 7 438 10
## X1992 292 881 200 7 428 9
## Central African Republic Chad Chile China Colombia Comoros
## X1990 318 251 45 327 88 188
## X1991 336 272 41 321 85 177
## X1992 342 282 38 315 82 167
## Congo, Rep. Cook Islands Costa Rica Croatia Cuba Cyprus
## X1990 209 0 30 126 32 14
## X1991 222 10 28 123 29 13
## X1992 231 57 27 121 26 13
## Czech Republic Cote d'Ivoire Korea, Dem. Rep. Congo, Dem. Rep.
## X1990 22 292 841 275
## X1991 22 304 828 306
## X1992 22 306 815 327
## Denmark Djibouti Dominica Dominican Republic Ecuador Egypt
## X1990 12 1,485 24 183 282 48
## X1991 12 1,477 24 173 271 47
## X1992 11 1,463 24 164 259 47
## El Salvador Equatorial Guinea Eritrea Estonia Ethiopia Fiji Finland
## X1990 133 169 245 50 312 68 14
## X1991 126 181 245 50 337 65 12
## X1992 119 187 242 56 351 62 11
## France French Polynesia Gabon Gambia Georgia Germany Ghana Greece
## X1990 21 67 359 350 51 15 533 30
## X1991 20 55 340 350 48 15 519 29
## X1992 19 91 325 349 50 14 502 27
## Grenada Guam Guatemala Guinea Guinea-Bissau Guyana Haiti Honduras
## X1990 7 103 113 241 404 39 479 141
## X1991 7 101 111 248 403 43 464 133
## X1992 7 96 108 255 402 34 453 128
## Hungary Iceland India Indonesia Iran Iraq Ireland Israel Italy
## X1990 67 5 586 443 50 88 19 11 11
## X1991 68 4 577 430 51 88 18 10 10
## X1992 70 4 566 417 56 88 18 10 10
## Jamaica Japan Jordan Kazakhstan Kenya Kiribati Kuwait Kyrgyzstan
## X1990 10 62 19 95 125 1,026 89 90
## X1991 10 60 18 87 120 1,006 84 93
## X1992 10 58 17 85 134 986 80 93
## Laos Latvia Lebanon Lesotho Liberia Libyan Arab Jamahiriya Lithuania
## X1990 428 56 64 225 476 46 64
## X1991 424 57 64 231 473 45 66
## X1992 420 59 63 229 469 45 71
## Luxembourg Madagascar Malawi Malaysia Maldives Mali Malta Mauritania
## X1990 19 367 380 159 143 640 10 585
## X1991 18 368 376 158 130 631 9 587
## X1992 17 369 365 156 118 621 9 590
## Mauritius Mexico Micronesia, Fed. Sts. Monaco Mongolia Montserrat
## X1990 53 101 263 3 477 14
## X1991 51 93 253 3 477 14
## X1992 50 86 244 3 477 14
## Morocco Mozambique Myanmar Namibia Nauru Nepal Netherlands
## X1990 134 287 411 650 170 629 11
## X1991 130 313 400 685 285 607 10
## X1992 127 328 389 687 280 585 10
## Netherlands Antilles New Caledonia New Zealand Nicaragua Niger
## X1990 28 112 10 145 317
## X1991 27 107 10 137 318
## X1992 25 104 9 129 319
## Nigeria Niue Northern Mariana Islands Norway Oman Pakistan Palau
## X1990 282 118 142 8 40 430 96
## X1991 307 115 201 8 36 428 66
## X1992 321 113 301 8 29 427 43
## Panama Papua New Guinea Paraguay Peru Philippines Poland Portugal
## X1990 74 498 95 394 799 88 51
## X1991 73 498 93 368 783 87 49
## X1992 71 497 92 343 766 86 47
## Puerto Rico Qatar Korea, Rep. Moldova Romania Russian Federation
## X1990 17 71 223 105 118 69
## X1991 15 69 196 99 125 64
## X1992 17 69 174 103 134 70
## Rwanda Saint Kitts and Nevis Saint Lucia
## X1990 190 17 26
## X1991 211 17 26
## X1992 226 16 25
## Saint Vincent and the Grenadines Samoa San Marino
## X1990 45 36 9
## X1991 45 35 9
## X1992 44 34 8
## Sao Tome and Principe Saudi Arabia Senegal Seychelles Sierra Leone
## X1990 346 68 380 113 465
## X1991 335 60 379 110 479
## X1992 325 59 379 106 492
## Singapore Slovakia Slovenia Solomon Islands Somalia South Africa
## X1990 52 55 66 625 597 769
## X1991 52 56 62 593 587 726
## X1992 53 59 59 563 577 676
## Spain Sri Lanka Sudan Suriname Swaziland Sweden Switzerland
## X1990 44 109 409 109 629 5 14
## X1991 42 106 404 100 590 5 13
## X1992 40 104 402 79 527 6 12
## Syrian Arab Republic Tajikistan Thailand Macedonia, FYR Timor-Leste
## X1990 94 193 336 92 706
## X1991 89 162 319 90 694
## X1992 84 112 307 89 681
## Togo Tokelau Tonga Trinidad and Tobago Tunisia Turkey Turkmenistan
## X1990 702 139 45 17 49 83 105
## X1991 687 140 44 17 46 79 99
## X1992 668 143 43 17 49 77 101
## Turks and Caicos Islands Tuvalu Uganda Ukraine United Arab Emirates
## X1990 42 593 206 67 47
## X1991 40 573 313 64 44
## X1992 37 554 342 67 42
## United Kingdom Tanzania Virgin Islands (U.S.)
## X1990 9 215 30
## X1991 9 228 28
## X1992 10 240 27
## United States of America Uruguay Uzbekistan Vanuatu Venezuela
## X1990 7 35 114 278 46
## X1991 7 34 105 268 45
## X1992 7 33 102 259 44
## Viet Nam Wallis et Futuna West Bank and Gaza Yemen Zambia Zimbabwe
## X1990 365 126 55 265 436 409
## X1991 361 352 54 261 456 417
## X1992 358 64 54 263 494 415
Row names are sort of what in Pandas we get when we use the attribute .index in a data frame.
rownames(existing_df)
## [1] "X1990" "X1991" "X1992" "X1993" "X1994" "X1995" "X1996" "X1997"
## [9] "X1998" "X1999" "X2000" "X2001" "X2002" "X2003" "X2004" "X2005"
## [17] "X2006" "X2007"
In our data frame we see we have weird names for them. Every year is prefixed with an X. This is so because they started as column names. From the definition of a data.frame in R, we know that each column is a vector with a variable name. A name in R cannot start with a digit, so R automatically prefixes numbers with the letter X. Right know we will leave it like it is since it doesn’t really stop us from doing our analysis.
In the case of column names, they pretty much correspond to Pandas .columns attribute in a data frame.
colnames(existing_df)
## [1] "Afghanistan" "Albania"
## [3] "Algeria" "American Samoa"
## [5] "Andorra" "Angola"
## [7] "Anguilla" "Antigua and Barbuda"
## [9] "Argentina" "Armenia"
## [11] "Australia" "Austria"
## [13] "Azerbaijan" "Bahamas"
## [15] "Bahrain" "Bangladesh"
## [17] "Barbados" "Belarus"
## [19] "Belgium" "Belize"
## [21] "Benin" "Bermuda"
## [23] "Bhutan" "Bolivia"
## [25] "Bosnia and Herzegovina" "Botswana"
## [27] "Brazil" "British Virgin Islands"
## [29] "Brunei Darussalam" "Bulgaria"
## [31] "Burkina Faso" "Burundi"
## [33] "Cambodia" "Cameroon"
## [35] "Canada" "Cape Verde"
## [37] "Cayman Islands" "Central African Republic"
## [39] "Chad" "Chile"
## [41] "China" "Colombia"
## [43] "Comoros" "Congo, Rep."
## [45] "Cook Islands" "Costa Rica"
## [47] "Croatia" "Cuba"
## [49] "Cyprus" "Czech Republic"
## [51] "Cote d'Ivoire" "Korea, Dem. Rep."
## [53] "Congo, Dem. Rep." "Denmark"
## [55] "Djibouti" "Dominica"
## [57] "Dominican Republic" "Ecuador"
## [59] "Egypt" "El Salvador"
## [61] "Equatorial Guinea" "Eritrea"
## [63] "Estonia" "Ethiopia"
## [65] "Fiji" "Finland"
## [67] "France" "French Polynesia"
## [69] "Gabon" "Gambia"
## [71] "Georgia" "Germany"
## [73] "Ghana" "Greece"
## [75] "Grenada" "Guam"
## [77] "Guatemala" "Guinea"
## [79] "Guinea-Bissau" "Guyana"
## [81] "Haiti" "Honduras"
## [83] "Hungary" "Iceland"
## [85] "India" "Indonesia"
## [87] "Iran" "Iraq"
## [89] "Ireland" "Israel"
## [91] "Italy" "Jamaica"
## [93] "Japan" "Jordan"
## [95] "Kazakhstan" "Kenya"
## [97] "Kiribati" "Kuwait"
## [99] "Kyrgyzstan" "Laos"
## [101] "Latvia" "Lebanon"
## [103] "Lesotho" "Liberia"
## [105] "Libyan Arab Jamahiriya" "Lithuania"
## [107] "Luxembourg" "Madagascar"
## [109] "Malawi" "Malaysia"
## [111] "Maldives" "Mali"
## [113] "Malta" "Mauritania"
## [115] "Mauritius" "Mexico"
## [117] "Micronesia, Fed. Sts." "Monaco"
## [119] "Mongolia" "Montserrat"
## [121] "Morocco" "Mozambique"
## [123] "Myanmar" "Namibia"
## [125] "Nauru" "Nepal"
## [127] "Netherlands" "Netherlands Antilles"
## [129] "New Caledonia" "New Zealand"
## [131] "Nicaragua" "Niger"
## [133] "Nigeria" "Niue"
## [135] "Northern Mariana Islands" "Norway"
## [137] "Oman" "Pakistan"
## [139] "Palau" "Panama"
## [141] "Papua New Guinea" "Paraguay"
## [143] "Peru" "Philippines"
## [145] "Poland" "Portugal"
## [147] "Puerto Rico" "Qatar"
## [149] "Korea, Rep." "Moldova"
## [151] "Romania" "Russian Federation"
## [153] "Rwanda" "Saint Kitts and Nevis"
## [155] "Saint Lucia" "Saint Vincent and the Grenadines"
## [157] "Samoa" "San Marino"
## [159] "Sao Tome and Principe" "Saudi Arabia"
## [161] "Senegal" "Seychelles"
## [163] "Sierra Leone" "Singapore"
## [165] "Slovakia" "Slovenia"
## [167] "Solomon Islands" "Somalia"
## [169] "South Africa" "Spain"
## [171] "Sri Lanka" "Sudan"
## [173] "Suriname" "Swaziland"
## [175] "Sweden" "Switzerland"
## [177] "Syrian Arab Republic" "Tajikistan"
## [179] "Thailand" "Macedonia, FYR"
## [181] "Timor-Leste" "Togo"
## [183] "Tokelau" "Tonga"
## [185] "Trinidad and Tobago" "Tunisia"
## [187] "Turkey" "Turkmenistan"
## [189] "Turks and Caicos Islands" "Tuvalu"
## [191] "Uganda" "Ukraine"
## [193] "United Arab Emirates" "United Kingdom"
## [195] "Tanzania" "Virgin Islands (U.S.)"
## [197] "United States of America" "Uruguay"
## [199] "Uzbekistan" "Vanuatu"
## [201] "Venezuela" "Viet Nam"
## [203] "Wallis et Futuna" "West Bank and Gaza"
## [205] "Yemen" "Zambia"
## [207] "Zimbabwe"
These two functions show a common idiom in R, where we use the same function to get a value and to assign it. For example, if we want to change row names we will do something like:
colnames(existing_df) <- new_col_names
But as we said we will leave them as they are by now.
Data Indexing
Python
There is a whole section devoted to indexing and selecting data in DataFrames in the official documentation. Let’s apply them to our Tuberculosis cases dataframe.
We can access each data frame Series object by using its column name, as with a Python dictionary. In our case we can access each country series by its name.
existing_df['United Kingdom']
year
1990 9
1991 9
1992 10
1993 10
1994 9
1995 9
1996 9
1997 9
1998 9
1999 9
2000 9
2001 9
2002 9
2003 10
2004 10
2005 11
2006 11
2007 12
Name: United Kingdom, dtype: int64
Or just using the key value as an attribute.
existing_df.Spain
year
1990 44
1991 42
1992 40
1993 37
1994 35
1995 34
1996 33
1997 30
1998 30
1999 28
2000 27
2001 26
2002 26
2003 25
2004 24
2005 24
2006 24
2007 23
Name: Spain, dtype: int64
Or we can access multiple series passing their column names as a Python list.
existing_df[['Spain', 'United Kingdom']]
| ountry | Spain | United Kingdom |
|---|---|---|
| year | ||
| 1990 | 44 | 9 |
| 1991 | 42 | 9 |
| 1992 | 40 | 10 |
| 1993 | 37 | 10 |
| 1994 | 35 | 9 |
| 1995 | 34 | 9 |
| 1996 | 33 | 9 |
| 1997 | 30 | 9 |
| 1998 | 30 | 9 |
| 1999 | 28 | 9 |
| 2000 | 27 | 9 |
| 2001 | 26 | 9 |
| 2002 | 26 | 9 |
| 2003 | 25 | 10 |
| 2004 | 24 | 10 |
| 2005 | 24 | 11 |
| 2006 | 24 | 11 |
| 2007 | 23 | 12 |
We can also access individual cells as follows.
existing_df.Spain['1990']
44
Or using any Python list indexing for slicing the series.
existing_df[['Spain', 'United Kingdom']][0:5]
| country | Spain | United Kingdom |
|---|---|---|
| year | ||
| 1990 | 44 | 9 |
| 1991 | 42 | 9 |
| 1992 | 40 | 10 |
| 1993 | 37 | 10 |
| 1994 | 35 | 9 |
With the whole DataFrame, slicing inside of [] slices the rows. This is provided largely as a convenience since it is such a common operation.
existing_df[0:5]
| country | Afghanistan | Albania | Algeria | American Samoa | Andorra | Angola | Anguilla | Antigua and Barbuda | Argentina | Armenia | … | Uruguay | Uzbekistan | Vanuatu | Venezuela | Viet Nam | Wallis et Futuna | West Bank and Gaza | Yemen | Zambia | Zimbabwe |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| year | |||||||||||||||||||||
| 1990 | 436 | 42 | 45 | 42 | 39 | 514 | 38 | 16 | 96 | 52 | … | 35 | 114 | 278 | 46 | 365 | 126 | 55 | 265 | 436 | 409 |
| 1991 | 429 | 40 | 44 | 14 | 37 | 514 | 38 | 15 | 91 | 49 | … | 34 | 105 | 268 | 45 | 361 | 352 | 54 | 261 | 456 | 417 |
| 1992 | 422 | 41 | 44 | 4 | 35 | 513 | 37 | 15 | 86 | 51 | … | 33 | 102 | 259 | 44 | 358 | 64 | 54 | 263 | 494 | 415 |
| 1993 | 415 | 42 | 43 | 18 | 33 | 512 | 37 | 14 | 82 | 55 | … | 32 | 118 | 250 | 43 | 354 | 174 | 52 | 253 | 526 | 419 |
| 1994 | 407 | 42 | 43 | 17 | 32 | 510 | 36 | 13 | 78 | 60 | … | 31 | 116 | 242 | 42 | 350 | 172 | 52 | 250 | 556 | 426 |
5 rows × 207 columns
Indexing in production Python code
As stated in the official documentation, the Python and NumPy indexing operators [] and attribute operator . provide quick and easy access to pandas data structures across a wide range of use cases. This makes interactive work intuitive, as there’s little new to learn if you already know how to deal with Python dictionaries and NumPy arrays. However, since the type of the data to be accessed isn’t known in advance, directly using standard operators has some optimization limits. For production code, it is recommended that you take advantage of the optimized pandas data access methods exposed in this section.
For example, the .iloc method can be used for positional index access.
existing_df.iloc[0:2]
| country | Afghanistan | Albania | Algeria | American Samoa | Andorra | Angola | Anguilla | Antigua and Barbuda | Argentina | Armenia | … | Uruguay | Uzbekistan | Vanuatu | Venezuela | Viet Nam | Wallis et Futuna | West Bank and Gaza | Yemen | Zambia | Zimbabwe |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| year | |||||||||||||||||||||
| 1990 | 436 | 42 | 45 | 42 | 39 | 514 | 38 | 16 | 96 | 52 | … | 35 | 114 | 278 | 46 | 365 | 126 | 55 | 265 | 436 | 409 |
| 1991 | 429 | 40 | 44 | 14 | 37 | 514 | 38 | 15 | 91 | 49 | … | 34 | 105 | 268 | 45 | 361 | 352 | 54 | 261 | 456 | 417 |
2 rows × 207 columns
While .loc is used for label access.
existing_df.loc['1992':'2005']
| country | Afghanistan | Albania | Algeria | American Samoa | Andorra | Angola | Anguilla | Antigua and Barbuda | Argentina | Armenia | … | Uruguay | Uzbekistan | Vanuatu | Venezuela | Viet Nam | Wallis et Futuna | West Bank and Gaza | Yemen | Zambia | Zimbabwe |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| year | |||||||||||||||||||||
| 1992 | 422 | 41 | 44 | 4 | 35 | 513 | 37 | 15 | 86 | 51 | … | 33 | 102 | 259 | 44 | 358 | 64 | 54 | 263 | 494 | 415 |
| 1993 | 415 | 42 | 43 | 18 | 33 | 512 | 37 | 14 | 82 | 55 | … | 32 | 118 | 250 | 43 | 354 | 174 | 52 | 253 | 526 | 419 |
| 1994 | 407 | 42 | 43 | 17 | 32 | 510 | 36 | 13 | 78 | 60 | … | 31 | 116 | 242 | 42 | 350 | 172 | 52 | 250 | 556 | 426 |
| 1995 | 397 | 43 | 42 | 22 | 30 | 508 | 35 | 12 | 74 | 68 | … | 30 | 119 | 234 | 42 | 346 | 93 | 50 | 244 | 585 | 439 |
| 1996 | 397 | 42 | 43 | 0 | 28 | 512 | 35 | 12 | 71 | 74 | … | 28 | 111 | 226 | 41 | 312 | 123 | 49 | 233 | 602 | 453 |
| 1997 | 387 | 44 | 44 | 25 | 23 | 363 | 36 | 11 | 67 | 75 | … | 27 | 122 | 218 | 41 | 273 | 213 | 46 | 207 | 626 | 481 |
| 1998 | 374 | 43 | 45 | 12 | 24 | 414 | 36 | 11 | 63 | 74 | … | 28 | 129 | 211 | 40 | 261 | 107 | 44 | 194 | 634 | 392 |
| 1999 | 373 | 42 | 46 | 8 | 22 | 384 | 36 | 9 | 58 | 86 | … | 28 | 134 | 159 | 39 | 253 | 105 | 42 | 175 | 657 | 430 |
| 2000 | 346 | 40 | 48 | 8 | 20 | 530 | 35 | 8 | 52 | 94 | … | 27 | 139 | 143 | 39 | 248 | 103 | 40 | 164 | 658 | 479 |
| 2001 | 326 | 34 | 49 | 6 | 20 | 335 | 35 | 9 | 51 | 99 | … | 25 | 148 | 128 | 41 | 243 | 13 | 39 | 154 | 680 | 523 |
| 2002 | 304 | 32 | 50 | 5 | 21 | 307 | 35 | 7 | 42 | 97 | … | 27 | 144 | 149 | 41 | 235 | 275 | 37 | 149 | 517 | 571 |
| 2003 | 308 | 32 | 51 | 6 | 18 | 281 | 35 | 9 | 41 | 91 | … | 25 | 152 | 128 | 39 | 234 | 147 | 36 | 146 | 478 | 632 |
| 2004 | 283 | 29 | 52 | 9 | 19 | 318 | 35 | 8 | 39 | 85 | … | 23 | 149 | 118 | 38 | 226 | 63 | 35 | 138 | 468 | 652 |
| 2005 | 267 | 29 | 53 | 11 | 18 | 331 | 34 | 8 | 39 | 79 | … | 24 | 144 | 131 | 38 | 227 | 57 | 33 | 137 | 453 | 680 |
14 rows × 207 columns
And we can combine that with series indexing by column.
existing_df.loc[['1992','1998','2005'],['Spain','United Kingdom']]
| country | Spain | United Kingdom |
|---|---|---|
| 1992 | 40 | 10 |
| 1998 | 30 | 9 |
| 2005 | 24 | 11 |
This last approach is the recommended when using Pandas data frames, specially when doing assignments (something we are not doing here). Otherwise, we might have assignment problems as described here.
R
Similarly to what we do in Pandas (actually Pandas is inspired in R), we can
access a data.frame column by its position.
existing_df[,1]
## X1990 X1991 X1992 X1993 X1994 X1995 X1996 X1997 X1998 X1999 X2000 X2001
## 436 429 422 415 407 397 397 387 374 373 346 326
## X2002 X2003 X2004 X2005 X2006 X2007
## 304 308 283 267 251 238
## 17 Levels: 238 251 267 283 304 308 326 346 373 374 387 397 407 415 ... 436
The position-based indexing in R uses the first element for the row number and
the second one for the column one. If left blank, we are telling R to get all
the row/columns. In the previous example we retrieved all the rows for the first
column (Afghanistan) in the data.frame. And yes, R has a 1-based indexing
schema.
Like in Pandas, we can use column names to access columns (series in Pandas).
However R data.frame variables aren’t exactly object and we don’t use the .
operator but the $ that allows accessing labels within a list.
existing_df$Afghanistan
## X1990 X1991 X1992 X1993 X1994 X1995 X1996 X1997 X1998 X1999 X2000 X2001
## 436 429 422 415 407 397 397 387 374 373 346 326
## X2002 X2003 X2004 X2005 X2006 X2007
## 304 308 283 267 251 238
## 17 Levels: 238 251 267 283 304 308 326 346 373 374 387 397 407 415 ... 436
An finally, since a data.frame is a list of elements (its columns), we can access
columns as list elements using the list indexing operator [[]].
existing_df[[1]]
## X1990 X1991 X1992 X1993 X1994 X1995 X1996 X1997 X1998 X1999 X2000 X2001
## 436 429 422 415 407 397 397 387 374 373 346 326
## X2002 X2003 X2004 X2005 X2006 X2007
## 304 308 283 267 251 238
## 17 Levels: 238 251 267 283 304 308 326 346 373 374 387 397 407 415 ... 436
At this point you should have realised that in R there are multiple ways of doing the same thing, and that this seems to happen more because of the language itself than because somebody wanted to provide different ways of doing things. This strongly contrasts with Python’s philosophy of having one clear way of doing things (the Pythonic way).
For row indexing we have the positional approach.
existing_df[1,]
## Afghanistan Albania Algeria American Samoa Andorra Angola Anguilla
## X1990 436 42 45 42 39 514 38
## Antigua and Barbuda Argentina Armenia Australia Austria Azerbaijan
## X1990 16 96 52 7 18 58
## Bahamas Bahrain Bangladesh Barbados Belarus Belgium Belize Benin
## X1990 54 120 639 8 62 16 65 140
## Bermuda Bhutan Bolivia Bosnia and Herzegovina Botswana Brazil
## X1990 10 924 377 160 344 124
## British Virgin Islands Brunei Darussalam Bulgaria Burkina Faso
## X1990 32 91 43 179
## Burundi Cambodia Cameroon Canada Cape Verde Cayman Islands
## X1990 288 928 188 7 449 10
## Central African Republic Chad Chile China Colombia Comoros
## X1990 318 251 45 327 88 188
## Congo, Rep. Cook Islands Costa Rica Croatia Cuba Cyprus
## X1990 209 0 30 126 32 14
## Czech Republic Cote d'Ivoire Korea, Dem. Rep. Congo, Dem. Rep.
## X1990 22 292 841 275
## Denmark Djibouti Dominica Dominican Republic Ecuador Egypt
## X1990 12 1,485 24 183 282 48
## El Salvador Equatorial Guinea Eritrea Estonia Ethiopia Fiji Finland
## X1990 133 169 245 50 312 68 14
## France French Polynesia Gabon Gambia Georgia Germany Ghana Greece
## X1990 21 67 359 350 51 15 533 30
## Grenada Guam Guatemala Guinea Guinea-Bissau Guyana Haiti Honduras
## X1990 7 103 113 241 404 39 479 141
## Hungary Iceland India Indonesia Iran Iraq Ireland Israel Italy
## X1990 67 5 586 443 50 88 19 11 11
## Jamaica Japan Jordan Kazakhstan Kenya Kiribati Kuwait Kyrgyzstan
## X1990 10 62 19 95 125 1,026 89 90
## Laos Latvia Lebanon Lesotho Liberia Libyan Arab Jamahiriya Lithuania
## X1990 428 56 64 225 476 46 64
## Luxembourg Madagascar Malawi Malaysia Maldives Mali Malta Mauritania
## X1990 19 367 380 159 143 640 10 585
## Mauritius Mexico Micronesia, Fed. Sts. Monaco Mongolia Montserrat
## X1990 53 101 263 3 477 14
## Morocco Mozambique Myanmar Namibia Nauru Nepal Netherlands
## X1990 134 287 411 650 170 629 11
## Netherlands Antilles New Caledonia New Zealand Nicaragua Niger
## X1990 28 112 10 145 317
## Nigeria Niue Northern Mariana Islands Norway Oman Pakistan Palau
## X1990 282 118 142 8 40 430 96
## Panama Papua New Guinea Paraguay Peru Philippines Poland Portugal
## X1990 74 498 95 394 799 88 51
## Puerto Rico Qatar Korea, Rep. Moldova Romania Russian Federation
## X1990 17 71 223 105 118 69
## Rwanda Saint Kitts and Nevis Saint Lucia
## X1990 190 17 26
## Saint Vincent and the Grenadines Samoa San Marino
## X1990 45 36 9
## Sao Tome and Principe Saudi Arabia Senegal Seychelles Sierra Leone
## X1990 346 68 380 113 465
## Singapore Slovakia Slovenia Solomon Islands Somalia South Africa
## X1990 52 55 66 625 597 769
## Spain Sri Lanka Sudan Suriname Swaziland Sweden Switzerland
## X1990 44 109 409 109 629 5 14
## Syrian Arab Republic Tajikistan Thailand Macedonia, FYR Timor-Leste
## X1990 94 193 336 92 706
## Togo Tokelau Tonga Trinidad and Tobago Tunisia Turkey Turkmenistan
## X1990 702 139 45 17 49 83 105
## Turks and Caicos Islands Tuvalu Uganda Ukraine United Arab Emirates
## X1990 42 593 206 67 47
## United Kingdom Tanzania Virgin Islands (U.S.)
## X1990 9 215 30
## United States of America Uruguay Uzbekistan Vanuatu Venezuela
## X1990 7 35 114 278 46
## Viet Nam Wallis et Futuna West Bank and Gaza Yemen Zambia Zimbabwe
## X1990 365 126 55 265 436 409
There we retrieved data for every country in 1990. We can combine this with a column number.
existing_df[1,1]
## X1990
## 436
## 17 Levels: 238 251 267 283 304 308 326 346 373 374 387 397 407 415 ... 436
Or its name.
existing_df$Afghanistan[1]
## X1990
## 436
## 17 Levels: 238 251 267 283 304 308 326 346 373 374 387 397 407 415 ... 436
What did just do before? Basically we retrieved a column, that is a vector, and
accessed that vector first element. That way we got the value for Afghanistan for
the year 1990. We can do the same thing using the [[]] operator instead of the
list element label.
existing_df[[1]][1]
## X1990
## 436
## 17 Levels: 238 251 267 283 304 308 326 346 373 374 387 397 407 415 ... 436
We can also select multiple columns and/or rows by passing R vectors.
existing_df[c(3,9,16),c(170,194)]
## Spain United Kingdom
## X1992 40 10
## X1998 30 9
## X2005 24 11
Finally, using names is also possible when using positional indexing.
existing_df["X1992","Spain"]
## X1992
## 40
## Levels: 25 26 27 28 30 33 23 24 34 35 37 40 42 44
That we can combine with vectors.
existing_df[c("X1992", "X1998", "X2005"), c("Spain", "United Kingdom")]
## Spain United Kingdom
## X1992 40 10
## X1998 30 9
## X2005 24 11
Next Steps
So enough about indexing. In the next part of the tutorial on data frames we will see how to perform more complex data accessing using selection. Additionally, we will explain how to apply functions to a data frame elements, and how to group them.
Remember that all the source code for the different parts of this series of tutorials and applications can be checked at GitHub. Feel free to get involved and share your progress with us!
ShiningPanda
Track your licenses!
Requires.io is proud to introduce a new feature: license tracking for your requirements!
And if a licence differs between your version of a package and its latest one, you get the information in both Requirement and Latest columns.
We hope you will enjoy this new feature!
Python Software Foundation
CSA Awards to Tollervey, Stinner, and Storchaka
congratulations,and will turn to telling you about our third quarter award recipients.
RESOLVED, that the Python Software Foundation award the 2015 3rd Quarter Community Service Award to Victor Stinner and Serhiy Storchaka (PSF CSA).
Continuum Analytics Blog
Continuum Analytics - September Tech Events
Our team is gearing up for a big presence at the Strata Hadoop World Conference at the end of this month, where we’ll be presenting theatre talks from our founder and developers, as well as demos, givewaways, and much more. Take a look at where we’ll be all month and let us know at info@continuum.io if you’d like to schedule a meeting.
September 02, 2015
Mozilla Web Development
Node.js static file build steps in Python Heroku apps
I write a lot of webapps. I like to use Python for the backend, but most
frontend tools are written in Node.js. LESS gives me nicer style sheets, Babel
lets me write next-generation JavaScript, and NPM helps manage dependencies
nicely. As a result, most of my projects are polyglots that can be difficult to
deploy.
Modern workflows have already figured this out: Run all the tools. Most
READMEs I’ve written lately tend to look like this:
$ git clone https://github.example.com/foo/bar.git
$ cd git
$ pip install -r requirements.txt
$ npm install
$ gulp static-assets
$ python ./manage.py runserver
I like to deploy my projects using Heroku. They take care of the messy details
about deployment, but they don’t seem to support multi-language projects easily.
There are Python and Node buildpacks, but no clear way of combining the two.
Multi Buildpack
GitHub is littered with attempts to fix this by building new buildpacks.
The problem is they invariable fall out of compatibility with Heroku. I could
probably fix, but then I’d have to maintain them. I use Heroku to avoid
maintaining infrastructure; custom buildpacks are one step forward, but two
steps back.
Enter Multi Buildpack, which runs multiple buildpacks at once.
It is simple enough that it is unlike to fall out of compatibility. Heroku has a
fork of the project on their GitHub account, which implies that it will be
maintained in the future.
To configure the buildpack, first tell Heroku you want to use it:
$ heroku buildpacks:set https://github.com/heroku/heroku-buildpack-multi.git
Next, add a .buildpacks file to your project that lists the buildpacks to run:
https://github.com/heroku/heroku-buildpack-nodejs.git
https://github.com/heroku/heroku-buildpack-python.git
Buildpacks are executed in the order they’re listed in, allowing later
buildpacks to use the tools and scripts installed by earlier buildpacks.
The Problem With Python
There’s one problem: The Python buildpack moves files around, which makes it
incompatible with the way the Node buildpack installs commands. This means that
any asset compilation or minification done as a step of the Python buildpack
that depends on Node will fail.
The Python buildpack automatically detects a Django project and runs
./manage.py collectstatic. But the Node environment isn’t available, so this
fails. No static files get built.
There is a solution: bin/post_compile! If present in your repository, this
script will be run at the end of the build process. Because it runs outside of
the Python buildpack, commands installed by the Node buildpack are available and
will work correctly.
This trick works with any Python webapp, but lets use a Django project as an
example. I often use Django Pipeline for static asset compilation. Assets
are compiled using the command ./manage.py collectstatic, which, when properly
configured, will call all the Node commands.
#!/bin/bash
export PATH=/app/.heroku/node/bin:$PATH
./manage.py collectstatic --noinput
Alternatively, you could call Node tools like Gulp or Webpack directly.
In the case of Django Pipeline, it is also useful to disable the Python
buildpack from running collectstatic, since it will fail anyways. This is done
using an environment variable:
heroku config:set DISABLE_COLLECTSTATIC 1
Okay, so there is a little hack here. We still had to append the Node binary
folder toPATH. Pretend you didn’t see that! Or don’t, because you’ll need
to do it in your script too.
That’s it
To recap, this approach:
- Only uses buildpacks available from Heroku
- Supports any sort of Python and/or Node build steps
- Doesn’t require vendoring or pre-compiling any static assets
Woot!
Marcos Dione
breaking-off
Having my own version of the python parser has proven, so far, to be clumsy and chaotic. Clumsy because it means that I need a special interpreter just to run my language (which in any case uses an interpreter!), chaotic because the building of such interpreter has proven to not work stably in different machines. This means that currently it only works for me.
Because of this and because I wanted even more control over the parser (who said allowing to write things like
rsync(--help)?), I decided to check my options. A friend of mine, more used to
playing with languages, suggested using pypy to create my own parser,
but that just lead me a little further: why not outright 'steal' pypy's parser? After all, they have their own, which
is also generated from Python's Python.adsl.
In fact it took me one hour to port the parser and a couple more porting the AST builder. This included porting them
to Python3 (both by running 2to3 and then applying some changes by hand, notably dict.iteritems -> dict.items)
and trying to remove as much dependency on the rest of pypy, specially from rpython.
The last step was to migrate from their own AST implementation to Python's, but here's where (again) I hit the last
brick wall: the ast.AST class and subclasses are very special. They're implemented in C, but
the Python API does not allow to create nodes with the line and column info. for a moment I contemplated the option of
creating another extension (that is, written in C) to make those calls, but the the obvious solution came to mind: a
massive replacement from:
return ast.ASTClass ([params], foo.lineno, foo.column)
into:
new_node = ast.ASTClass ([params])
new_node.lineno = foo.lineno
new_node.column = foo.column
return new_node
and some other similar changes. See here if you're really interested in all the details . I can only be grateful for regular expressions, capturing groups and editors that support both.
The following code is able to parse and dump a simple python script:
#! /usr/bin/env python3
import ast
from pypy.interpreter.pyparser import pyparse
from pypy.interpreter.astcompiler import astbuilder
info= pyparse.CompileInfo('setup.py', 'exec')
p= pyparse.PythonParser(None)
t= p.parse_source (open ('setup.py').read(), info)
a= astbuilder.ast_from_node (None, t, info)
print (ast.dump (a))
The result is the following (formatted by hand):
Module(body=[
ImportFrom(module='distutils.core', names=[alias(name='setup', asname=None)], level=0),
Import(names=[alias(name='ayrton', asname=None)]),
Expr(value=Call(func=Name(id='setup', ctx=<class '_ast.Load'>), args=None, keywords=[
keyword(arg='name', value=Str(s='ayrton')),
keyword(arg='version', value=Attribute(value=Name(id='ayrton', ctx=<class '_ast.Load'>), attr='__version__', ctx=<class '_ast.Load'>)),
keyword(arg='description', value=Str(s='a shell-like scripting language based on Python3.')),
keyword(arg='author', value=Str(s='Marcos Dione')),
keyword(arg='author_email', value=Str(s='mdione@grulic.org.ar')),
keyword(arg='url', value=Str(s='https://github.com/StyXman/ayrton')),
keyword(arg='packages', value=List(elts=[Str(s='ayrton')], ctx=<class '_ast.Load'>)),
keyword(arg='scripts', value=List(elts=[Str(s='bin/ayrton')], ctx=<class '_ast.Load'>)),
keyword(arg='license', value=Str(s='GPLv3')),
keyword(arg='classifiers', value=List(elts=[
Str(s='Development Status :: 3 - Alpha'),
Str(s='Environment :: Console'),
Str(s='Intended Audience :: Developers'),
Str(s='Intended Audience :: System Administrators'),
Str(s='License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)'), Str(s='Operating System :: POSIX'),
Str(s='Programming Language :: Python :: 3'),
Str(s='Topic :: System'),
Str(s='Topic :: System :: Systems Administration')
],
ctx=<class '_ast.Load'>))
], starargs=None, kwargs=None))
])
The next steps are to continue removing references to pypy code, and make sure it can actually parse all possible code.
Then I should revisit the harcoded limitations in the parser (in particular in
this loop and then
be able to freely format program calls :).
Interesting times are arriving to ayrton!
Update: fixed last link. Thanks nueces!
python ayrton
Kushal Das
Day 4 of Flock 2015
Day four of flock started at 10AM, later than the usual 9am, which was really good as everyone needed that extra hour of sleep. Though I was getting up generally around 5AM, but I managed to get enough sleep on that morning. Came down to the lobby, found people slowly moving into different rooms. I went in the SPC workshop by Dan Walsh. The session started in very informal way. As I already missed his talk on the same topic due to clash with another talk, this was my chance to catch up with updates from him. I also found more copies of “Containers coloring book”, another excellent work from Mizmo and Dan’s collaboration. Feel free to download, and print the PDF copy. It really explains the security ideas in layman terms.
During lunch I went out with Kevin, Patrick, and Pierre-Yves. The salad was one of the best I had, it was heavy too. Came back to the venue with a full stomach. Only Patrick’s explanation about remote client authentication system made sure that I did not fall asleep. He also helped me to enable 2-factor authentication for laptop drive encryption. We had many more discussions about best practices, and how to stay paranoid about security :) He also showed me the great documentation from python-cryptography project. I will explain the use case in a future blog post.
The day ended with another trip to the Belgian beer place. After dinner, many went to more social interactions. But I chose to come back as I had to wake up early next day for the next part of my road trip.
This Flock seems to be very useful as many discussions happened, which in turn helped to resolve many open issues. We also added many new items in our TODO lists, but that is what we expect from any good conference like this one. Having the event venue in the same hotel also helped a lot, many got the required sleep in between without spending time going back and forth between venue and hotel.
Brian Okken
Why test? (PT003)
Answering a listener question. Why testing? What are the benefits? Why automated testing over manual testing? Why test first? Why do automated testing during development? Why test to the user level API? After describing my ideal test strategy and project, I list: Business related, practical benefits of testing Personal reasons to embrace testing Pragmatic, day […]
The post Why test? (PT003) appeared first on Python Testing.
بايثون العربي
التعابير القياسية في بايثون الجزء 2
في الجزء الأول من هذه السلسلة توقفنا عند هذا الكود
import re
if re.search('a*', 'cucumber'):
print "found it!"
else:
print "didn't find it :("
ونتيجة هذا الكود ستكون found it وهذا بسبب أن العلامة * في عالم التعابير القياسية تعني تطابق منعدم لأخر حرف في السلسلة اي إذا كان النمط *a كاننا نقول اذا لم تجد اي تطابق للحرف a في أخر حرف من السلسلة فقم بكذا وهذا ماحدث في مثالنا حيث لم يجد أي تطابق للحرف a في الحرف الأخير من السلسلة cucumber كما أن جميع السلاسل الفارغة تتطابق مع العلامة *.
العلامة +
تقوم هذه العلامة بتطابق نمط واحد أو أكثر حيث ان +a تتطابق مع batman ;ولا تتطابق مع cucumber .
العلامة ^
تعني هذه العلامة بداية السلسلة حيث أن النمط ^j تتطابق مع السلسلة joker ولكنها لا تتطابق مع السلسلة banjo.
العلامة $
هذه العلامة عكس سابقتها ^ أي تعني نهاية السلسلة حيث أن $g تتطابق مع السلسلة bag ولا تتطابق مع game.
حان الوقت لجدول أخر :

مجموعة من الأمثلة المختلفة عن التعابير القياسية
دعونا نأخد مثال رائع بحيث يكون لدينا برنامج يتحقق من السلسلة ما إذا كانت تتطابق مع شكل رقم الهاتف مثل 0123456789
import re
number = raw_input("enter phone number: ")
if re.search('^\d\d\d\d\d\d\d\d\d\d$', number):
print "it's valid!"
else:
print "nope!"
الأقواس في التعابير القياسية عبارة عن حالات خاصة ساقوم بشرحها في الجزء الثالث بحول الله ولهذا لم أستعملها حتى نتكلم عنهم ، على كل حال بدأنا بالنمط^ والتي تمثل بداية السلسلة واما العلامة$ فإنها تمثل نهاية السلسلة وهكذا لا يمكننا أن نقول على رقم ما أنه رقم صحيح إذا قام المستخدم بإدخال سلسلة مثل “blahblahblah3333333333something else ” بعد ذلك وضعنا عشرة أرقام وهذا هو كل شيء .
سأكتفي يهذا القدر القليل فقط حتى لا تختلط علينا الأمور وسنكمل باقي السلسلة في الجزء الثالث
Mike Driscoll
eBook Review: Intermediate Python
I was recently approached by the author of the free eBook, Intermediate Python by Muhammad Yasoob Ullah Khalid to review his work. Yasoob is the fellow behind the Python Tips blog. The book has been released as open source on Github but can be downloaded as a PDF from ReadTheDocs. But before I go into too much detail about the book, here’s my quick review:
Quick Review
- Why I picked it up: I was asked by the author to read the book.
Book Formats
You can get this book in PDF, ePub, Mobi, HTML or in its source, which is in RestructuredText. You can purchase the eBook here.
Book Contents
The book is split up into 24 chapters covering 75 pages at the time of writing this review, but that may change since it’s open source.
Full Review
The book covers a lot of material, but doesn’t go into depth on any topic. It reminds me a lot of my own book, actually. There isn’t a bunch of introductory material and each chapter gives the minimum amount of information needed to grasp the topic. Some topics do get a bit more coverage than others. Each chapter is between 2 and 8 pages in length. I should mention that the book is rather rough and reads like a first draft. The Table of Contents is empty, for example. English does not appear to be the author’s first language, so some of the sentences can be a bit awkward as well. One of the benefits of being open source is that anyone can come along and fix these minor issues though.
Let’s spend some time talking about what the book covers.
The first few chapters go over *args / **kwargs, debugging, generators, and map/filter. At this point, I’m sure some of my readers will question whether or not the book is really covering intermediate level material as some will argue that *args / **kwargs or the map built-in are more beginner material. Frankly, there’s a fine line between what’s considered beginner and intermediate, so I’m not really going to go there. The book is free so you can make your own determination. Besides, there are plenty of intermediate level material to be had here.
The next few chapters go over decorators, mutation, __slots__ and virtualenv, among other things. You will also find chapters on the collections module, object introspection, coroutines, lambdas, function caching and context managers. There are a bunch of other chapters that cover such items as exception handling, globals, enumerate, comprehensions, virtualenv, ternary operators, etc.
I found the book interesting and it does cover a wide variety of information. The topics don’t seem to be grouped in a logical order though. Overall, I think the average Python programmer will find some nuggets of information to be gleaned from this book and with the entry price being free, I think it’s worth checking out. If you do like the book, you should support the author by purchasing the book.
![]() |
Intermediate Pythonby Muhammad Yasoob Ullah Khalid |
Other Book Reviews
- IPython Notebook Essentials by L. Felipe Martins
- Creating Apps in Kivy by Dusty Phillips
- Kivy – Interactive Applications in Python by Roberto Ulloa
- Instant Flask Web Development by Ron DuPlain
- Real Python by Fletcher Heisler
- Python 3 Object Oriented Programming by Dusty Phillips
PyCharm
PyCharm Educational Edition 2.0 is coming soon
Today, we’re excited to let you know that PyCharm Educational Edition 2.0 is coming this September. Since the previous release we’ve improved a lot of things, implemented new functionality, added new programming courses, and fixed bugs.
The best things about this edition will stay unchanged – it’s still going to be a completely free and open source software, specifically designed to help beginners with little or no previous coding experience to learn programming quickly and efficiently, while using a modern professional tool.
The list of improvements can be found on the Coming in v2.0 page. They are:
- Simplified Step-by-Step Debugger
- Inline Debugger
- Simplified UI
- Scratch Files
- Quick Package Installation
- Integration with Stepic
- Various course creation improvements
We encourage you to sign up for PyCharm Educational Edition 2.0 Preview and use it in the upcoming educational year.
Learn programming and educate with pleasure!
JetBrains PyCharm Team
Codementor
Building Data Products with Python: A Wine Review Website using Django and Bootstrap
Introduction
With this tutorial, we start a series of tutorials about how to build data products with Python. As a leitmotif we want to build a web-based wine reviews and recommendations website using Python technologies such as Django and Pandas. We have chosen to build a wine reviews and recommendations website, but the concepts and the technology stack can be applied to any user reviews and recommendation product.
We want this tutorial to leave you with a product that you can adapt and show to as part of your portfolio. With this goal in mind, we will explain how to set up a Koding virtual machine and use it as a Django and Pandas + Scikit-learn Python development server.
Then we will start a Django project and a Django app for our Wine recommender web application. It will be an incremental process that can be followed by checking out individual tags in our GitHub repo. By doing so you can work in those individual tasks at a given stage that you find more interesting or difficult.
In the next tutorial, we will add user management and, once we have users identified, proceed to generate user recommendations using machine learning.
Remember that you can follow the tutorial at any development stage by forking the repo into your own GitHub account, and then cloning into your workspace and checking out the appropriate tag. By forking the repo you are free to change it as you like and experiment with it as much as you need. If at any point you feel like having a little bit of help with some step of the tutorial, or with making it your own, we can have a 1:1 Codementor session about it.
Usually will have deployed the latest stage of the app running at Koding, including all the related tutorials updates. But let’s start with our project from scratch!
Koding as a development server
Koding is a cloud-based development environment complete with free virtual machines (VMs), an attractive IDE and sudo level terminal access. It can be used as a software development playground and everything you need to move your software development to the cloud! A Koding VM has many of the popular software packages and programming languages preinstalled so that you can focus on learning vs installing and configuring.
We have used Koding while working in this tutorial, so it is a good option if you want to follow it. We even have the latest version of the website deployed in a test server running in our Koding VM (can be found here). However, this is not a requirement and you can work on the tutorial on your own machine or any other system that can install Python and the packages we need. About this, we recommend installing Anaconda that comes with all the analytics packages we will need in the later phases of the product.
So if you want to follow our very same steps, the first thing is to sign up for Koding. The only problem with the free account is that it comes with 3Gb diks space and we need at leat 4Gb. You will need to invite a couple of friend using your referral link so you can get more space.
Once you have your Koding VM up and running, go to the Anaconda website and download+install the version of Anaconda for your OS.
Once this is done, the last bit is to install Django. What I did is to install it using the pip version that comes with Anaconda. That is, if you are at the root folder that contains your Anaconda installation, just type.
./anaconda/bin/pip install django
The GitHub repo
One of the coolest things about this tutorial is that all the code is available at GitHub, and that each section is tagged in a different tag (e.g. stage-0 is the empty repo). So go to the repo and fork it to your GitHub account, and then clone it into your development server (Koding in our case).
You can check out any tag and create a branch from that point, and then work in your changes. Or you can just work on your own and use the code in the repo to copy and paste and complete your files. For example, if we want to check out the first tag that has actually some work done, type from the root folder in the cloned repo:
git checkout stage-0.1
The core of our Django web application
A Django project lifecycle is managed by two commands. First we use django-admin.py to create a project. Then we use python manage.py COMMAND to do anything else. So let’s start by creating a Django project.
Starting up the project with startproject
Create a directory folder where you want to place your Django project and move there using cd. Then run the django-admin.py command to create the project as follows.
django-admin.py startproject winerama
Let’s look at what we just created by running the startproject command.
tree winerama
winerama
|-- manage.py
`-- winerama
|-- __init__.py
|-- settings.py
|-- urls.py
`-- wsgi.py
This requires a bit of an explanation:
- The
winerama/root directory is a container for our project. manage.pyis a command-line utility that allows us to interact with our project in various ways. We will use it all the time so its purpose will be clear in a minute.- The inner
winerama/directory is the actual Python package for our project. Its name is also the Python package name for the project files. mysite/__init__.pyis an empty file that tells Python that this directory should be considered a Python package.mysite/settings.pyis the settings/configuration file for our project.mysite/urls.pycontains the URL declarations for this Django project.mysite/wsgi.pyis an entry-point for WSGI-compatible web servers to serve our project.
Running the server with runserver
python manage.py runserver 0.0.0.0:8000
Then we can go to our Koding server public URL http://KODING_USERNAME.koding.io:8000 where we replace KODING_USERNAME with our Koding user and check how the website looks so far.
Database setup
Now, open up winerama/settings.py. It’s a normal Python module with module-level variables representing Django settings.
By default, Django uses SQLite, and we will stick to it. SQLite is included in Python, so you won’t need to install anything else to support your database. If you plan to deploy this project into production, you better move to some production-ready database such as PostgreSQL to avoid database-switching headaches down the road.
So we don’t need to change anything in our settings file. If any, change winerama /settings.py to set TIME_ZONE to your time zone.
But in any case, some of the installed applications (more on this later) make use of at least one database table, though, so we need to create the tables in the database before we can use them. To do that, run the following command:
python manage.py migrate
Apps vs Projects
Now we are ready to start our wine reviews app. But wait, what is the difference between and app and a project? From the Django website:
What’s the difference between a project and an app? An app is a Web application that does something – e.g., a Weblog system, a database of public records or a simple poll app. A project is a collection of configuration and apps for a particular Web site. A project can contain multiple apps. An app can be in multiple projects. […] Django apps are “pluggable”: You can use an app in multiple projects, and you can distribute apps, because they don’t have to be tied to a given Django installation.
So in our case, our Winerama project will contain our first reviews app, that will allow users to add wine reviews. In order to do that, from the root winerame folder where manage.py is, we need to use the startapp command as follows.
python manage.py startapp reviews
This will create the following folder structure.
tree reviews
reviews
|-- __init__.py
|-- admin.py
|-- migrations
| `-- __init__.py
|-- models.py
|-- tests.py
`-- views.py
We will get to know those files in the following sections, while creating model entities, views, and setting up the admin interface.
But first we need to activate our reviews app. Edit the winerama/settings.py file, and change the INSTALLED_APPS setting to include the string ‘reviews’ as follows.
INSTALLED_APPS = (
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'reviews',
)
Adding model entities
In this first stage, our wine reviews app will contain two model entities: Wine and Review. A Wine has just a name. A Review has four fields: a name for the user that made the review, a wine rating, a publication date, and a text review. Additionally, each Review is associated with a Wine.
These two entities are represented by Python classes that we add to the reviews/models.py file as follows.
from django.db import models
import numpy as np
class Wine(models.Model):
name = models.CharField(max_length=200)
def average_rating(self):
all_ratings = map(lambda x: x.rating, self.review_set.all())
return np.mean(all_ratings)
def __unicode__(self):
return self.name
class Review(models.Model):
RATING_CHOICES = (
(1, '1'),
(2, '2'),
(3, '3'),
(4, '4'),
(5, '5'),
)
wine = models.ForeignKey(Wine)
pub_date = models.DateTimeField('date published')
user_name = models.CharField(max_length=100)
comment = models.CharField(max_length=200)
rating = models.IntegerField(choices=RATING_CHOICES)
Each of our two model entities is represented by a class that subclasses django.db.models.Model. Each model class variable represents a database field in the model and is represented by an instance of a Field sub-class. This specifies what type of data each field holds. The field name is its code reference in machine-friendly format, and our database will use it as the column name.
You can use an optional first positional argument to a Field to designate a human-readable name. If this field isn’t provided, Django will use the machine-readable name.
Some field classes have required and optional arguments, such as max_length for CharField, etc. That’s used both, in the database schema, and in validation.
Finally, we specify relationships between entities by using a ForeignKey field. That tells Django each Review is related to a single Wine.
In Django, model classes can also define methods providing domain functionality. In our case, we have defined a method to get the average score for a given wine, based on all the reviews associated with it (i.e. the method average_rating(self)).
Since we have made changes to our model, we need to propagate that to our database. We will do that by running the following command that creates migrations for our changes.
python manage.py makemigrations reviews
And now we can aply our migrations to the data base as follows, without losing data.
python manage.py migrate
This point of the project corresponds to the git tag stage-0.1.
Providing an Admin Site
First of all, we need to create an admin user by running the following command for the winerama/ root folder that contains manage.py.
python manage.py createsuperuser
You’ll be prompted for a user name, email address, and password. Introduce those that work for you. You will use them to login and admin the system.
The admin site is activated by default. Let’s explore it. If your website is not up and running, use the following command.
python manage.py runserver 0.0.0.0:8000
And now you can navigate to http://KODING_USERNAME:8000/admin/ (remember to replace KODING_USERNAME with your actual Koding username) and login with the user and password that you specified before.
You will notice that our model entities are not modifiable in the admin site. In order for them to be there, we need to add them in the reviews/admin.py file so it looks like this.
from django.contrib import admin
from .models import Wine, Review
class ReviewAdmin(admin.ModelAdmin):
model = Review
list_display = ('wine', 'rating', 'user_name', 'comment', 'pub_date')
list_filter = ['pub_date', 'user_name']
search_fields = ['comment']
admin.site.register(Wine)
admin.site.register(Review, ReviewAdmin)
We are basically importing the model classes we just defined, and then using register() to the Django that we want wines and reviews to be available in the admin site.
If we navigate again to the admin site, we will see a new Reviews section with a Wines and Reviews elements inside. These elements include two action buttons, Add and Change. Add can be used to introduce new wines or wine reviews. The forms are automatically generated from the Wine and Review models.
But with wine reviews we have done a little extra work and defined a custom admin class. There are many things we can do with this class, but in our case we have specified:
- what columns (and in what order) do we want to display in the entries list. That is, when using the Change button or when navigating to
admin/reviews/wineoradmin/reviews/review, we will see a list of all added entries. How the entries are listed when we go to the reviews section is specified by thelist_displayfield in theReviewAdminclass. - A list of filters that can be used to list reviews.
- A list of fields that will be matched when using the search box.
We suggest you experiment with this ReviewAdmin class (or create your own WineAdmin one) until you’re happy with the results.
We have accomplished a lot just by writing a few classes. Django is really powerful when it comes to providing an admin interface. In the next section, we will work on our actual user interface to add wine reviews.
This stage of the project corresponds to the git tag stage-0.2.
Adding Web Views
In Django, a view is a type of Web page that generally serves a specific function and has a specific template. The concept is taken from the Model-View-Controller architectural pattern so common in web application frameworks.
In Django, a view is actually a Python function that delivers a web page (and other content). When a user navigates to a URL within our application, Django will choose a view by examining the URL that’s requested.
But let’s show this in practice.
First thing we need to do is to include our reviews app in the project urls. In order to do so, edit the winerama/urls.py file (not the reviews one, yet), and modify the urlpatters list there to look like the following.
urlpatterns = [
url(r'^reviews/', include('reviews.urls', namespace="reviews")),
url(r'^admin/', include(admin.site.urls)),
]
Basically we have added a new mapping specifying that all requests starting with reviews/ will be passed to our reviews app url mapping under the namespace reviews.
Mappings can extract parts of the url and pass it as a parameter to the handling view. For example, the mapping:
url(r'^review/(?P<review_id>[0-9]+)/$', views.review_detail, name='review_detail')
extracts a number after the reviews/review/ part and passes it to the review_detail function defined later on.
So in order for this mapping to work, we need to add it in our reviews/urls.py file. Let’s add some of them. Change the reviews/urls.py file to look as follows.
from django.conf.urls import url
from . import views
urlpatterns = [
# ex: /
url(r'^$', views.review_list, name='review_list'),
# ex: /review/5/
url(r'^review/(?P<review_id>[0-9]+)/$', views.review_detail, name='review_detail'),
# ex: /wine/
url(r'^wine$', views.wine_list, name='wine_list'),
# ex: /wine/5/
url(r'^wine/(?P<wine_id>[0-9]+)/$', views.wine_detail, name='wine_detail'),
]
The mapping structure is the same as before. For example, we specify that any request starting with an empty string (plus the reviews/ prefix we added at the project level), will be handled by a function called review_list defined in our reviews/views.py file, and will be referenced within the local namespace (remember we gave the namespace reviews in the project winerama/urls.py) with the name reviews_list.
So now we need the actual views. As we said these are just Python functions that deal with model entities is required and decide how to render the results (as an HTML page in our case). Modify the reviews/views.py file to look like the following.
from django.shortcuts import get_object_or_404, render
from .models import Review, Wine
def review_list(request):
latest_review_list = Review.objects.order_by('-pub_date')[:9]
context = {'latest_review_list':latest_review_list}
return render(request, 'reviews/review_list.html', context)
def review_detail(request, review_id):
review = get_object_or_404(Review, pk=review_id)
return render(request, 'reviews/review_detail.html', {'review': review})
def wine_list(request):
wine_list = Wine.objects.order_by('-name')
context = {'wine_list':wine_list}
return render(request, 'reviews/wine_list.html', context)
def wine_detail(request, wine_id):
wine = get_object_or_404(Wine, pk=wine_id)
return render(request, 'reviews/wine_detail.html', {'wine': wine})
There we have defined four different views for each of the four different url mappings we specified previously. Each function gets at least a request object parameter, and optionally more parameters as specified in the url mapping. For example, the review_detail function gets also a review_id parameter as we specified in the mapping.
Once we are inside the view function, we normally do some model query, and create a context object with the results. Queries are normally performed by using the .objects attribute in the given domain entity class (e.g. Review.objects). We can apply different sorting methods, filters, etc. until we get the desired results (have a look here for more about query sets). This context object is then passed to the render function together with the reference to the template file that will generate the resulting web page.
In the order they appear in the file, the views are:
review_list: gets a list of the latest 9 reviews and renders it using `reviews/list.html’.review_detail: gets a review given its ID and renders it usingreview_detail.html.wine_list: gets all the wines sorted by name and passes it towine_list.htmlto be rendered.wine_detail: gets a wine from the DB given its ID and renders it usingwine_detail.html.
And now, we need to create the HTML templates that will generate the final pages. These are expressed in Django template language. This language allows us to put variable placeholders and control structures within HTML code in order to dynamically generate the final web page.
For example, this is how the review_list view template looks like.
<h2>Latest reviews</h2>
{% if latest_review_list %}
<div>
{% for review in latest_review_list %}
<div>
<h4><a href="{% url 'reviews:review_detail' review.id %}">
{{ review.wine.name }}
</a></h4>
<h6>rated {{ review.rating }} of 5 by {{ review.user_name }}</h6>
<p>{{ review.comment }}</p>
</div>
{% endfor %}
</div>
{% else %}
<p>No reviews are available.</p>
{% endif %}
This template makes use of some Django template language structures. All of them are enclosed within {% ... %} elements. For example, the {% if latest_review_list %}{% else %}{% endif %} directive is like any other if-else structure in computer programming. The first section will execute if the object latest_review_list exists, and the second section will do otherwise. The latest_review_list object is part of the context object we build and pass to render in the view function defined in reviews/views.py. The for block is equivalent to a programming for, and so on. Object methods and fields are accessed using the dot notation.
Django also provides us with a {% url ... %} we can use together with namespaces in order not to hardcode urls within templates. We make use of that in our template as we can see.
You can check how all the views look like in the stage-0.3 of the project repo. We don’t mean to explain all the details of the Django template language here. For the curious reader, have a look at the official Django documentation.
Therefore, adding new mappings is as easy as adding more elements to the urlpatters view. Once we do that, we need to add the appropriate function in reviews/views.py and create the HTML page that will render the results.
We have added a couple of wines and reviews to the database, so you can navigate to http://KODING_USERNAME:8000/reviews/ and see it in action. This is one of four possible web pages we can visit in our app. Are you able to check all of them using your browser from what we have seen in reviews/urls.py? Try to construct the urls.
This point of the project corresponds to the git tag stage-0.3.
Adding Reviews Using Forms
In this section we will explain how one of our views will include a form that will be used to add wine reviews.
This view is the wine detail view. Remember that when we show a wine’s details, we also show some of its recent reviews. What we want to do is provide also an HTML form that can be used to add a new review for that particular wine.
Let’s start with the easy part. This is how the wine detail view including a form looks like.
<h2>{{ wine.name }}</h2>
<h5>{{ wine.review_set.count }} reviews ({{ wine.average_rating | floatformat }} average rating)</h5>
<h3>Recent reviews</h3>
{% if wine.review_set.all %}
<div>
{% for review in wine.review_set.all %}
<div>
<em>{{ review.comment }}</em>
<h6>Rated {{ review.rating }} of 5 by {{ review.user_name }}</h6>
<h5><a href="{% url 'reviews:review_detail' review.id %}">
Read more
</a></h5>
</div>
{% endfor %}
</div>
{% else %}
<p>No reviews for this wine yet</p>
{% endif %}
<h3>Add your review</h3>
{% if error_message %}<p><strong>{{ error_message }}</strong></p>{% endif %}
<form action="{% url 'reviews:add_review' wine.id %}" method="post">
{% csrf_token %}
{{ form.as_p }}
<input type="submit" value="Add" />
</form>
You can see that we didn’t really include any form fields there. What we are doing here is using Django template language to leave a {{ form.as_p }} object to be rendered with the appropriate fields (as <p> HTML elements). We define this form class as a ModelForm in the file reviews/forms.py as follows.
from django.forms import ModelForm, Textarea
from reviews.models import Review
class ReviewForm(ModelForm):
class Meta:
model = Review
fields = ['user_name', 'rating', 'comment']
widgets = {
'comment': Textarea(attrs={'cols': 40, 'rows': 15}),
}
The ReviewForm class specifies the model object it’s going to use as a base (i.e. Review), a selection of fields to use, and also what widget to use for one of them (the comment field).
The first time we display the wine details, we need to pass a new empty form object. We add that in our wine_detail view function. We also add the add_review function view in reviews/views.py. The file will look as follows.
from django.shortcuts import get_object_or_404, render
from django.http import HttpResponseRedirect
from django.core.urlresolvers import reverse
from .models import Review, Wine
from .forms import ReviewForm
import datetime
def review_list(request):
latest_review_list = Review.objects.order_by('-pub_date')[:9]
context = {'latest_review_list':latest_review_list}
return render(request, 'reviews/review_list.html', context)
def review_detail(request, review_id):
review = get_object_or_404(Review, pk=review_id)
return render(request, 'reviews/review_detail.html', {'review': review})
def wine_list(request):
wine_list = Wine.objects.order_by('-name')
context = {'wine_list':wine_list}
return render(request, 'reviews/wine_list.html', context)
def wine_detail(request, wine_id):
wine = get_object_or_404(Wine, pk=wine_id)
form = ReviewForm()
return render(request, 'reviews/wine_detail.html', {'wine': wine, 'form': form})
def add_review(request, wine_id):
wine = get_object_or_404(Wine, pk=wine_id)
form = ReviewForm(request.POST)
if form.is_valid():
rating = form.cleaned_data['rating']
comment = form.cleaned_data['comment']
user_name = form.cleaned_data['user_name']
review = Review()
review.wine = wine
review.user_name = user_name
review.rating = rating
review.comment = comment
review.pub_date = datetime.datetime.now()
review.save()
# Always return an HttpResponseRedirect after successfully dealing
# with POST data. This prevents data from being posted twice if a
# user hits the Back button.
return HttpResponseRedirect(reverse('reviews:wine_detail', args=(wine.id,)))
return render(request, 'reviews/wine_detail.html', {'wine': wine, 'form': form})
The add_review function is in charge of validating the form and creating the new review instance. The first thing it does is to use the request url wine ID to look for the wine we are going to add the review to. It will redirect the view to a 404 page if it doesn’t find it. Otherwise, it will create a ReviewForm instance from the request POST data).
We can validate the form with a single call to form.isvalid(). When the for is not valid, we will just render the wine detail page again, passing the original form so it can be corrected. If the form is indeed valid, we create the review object, save it, and redirect to the wine details page again. Here we don’t render directly the page but internally navigate to the wine_detail view with the appropriate wine ID.
Now we need to connect everything together. The action attribute in the form HTML element, specifies what url handles the post request once the form is submitted. Then, we need create the appropriate url mapping in reviews/urls.py. Modify that file that should look like the following.
from django.conf.urls import url
from . import views
urlpatterns = [
# ex: /
url(r'^$', views.review_list, name='review_list'),
# ex: /review/5/
url(r'^review/(?P<review_id>[0-9]+)/$', views.review_detail, name='review_detail'),
# ex: /wine/
url(r'^wine$', views.wine_list, name='wine_list'),
# ex: /wine/5/
url(r'^wine/(?P<wine_id>[0-9]+)/$', views.wine_detail, name='wine_detail'),
url(r'^wine/(?P<wine_id>[0-9]+)/add_review/$', views.add_review, name='add_review'),
]
With this, we can navigate to any wine detail page (e.g. http://KODIG_USERNAME:8000/reviews/wine/1/) and add a new review. Don’t worry too much about the page not being very attractive. We will solve this in the next two sections.
This point of the project corresponds to the git tag stage-0.4.
Template Reuse
We have views that are useful. We can browse our wine and reviews, and we can add new reviews. Moreover, some views have links that allow us to navigate between them. However, we need some reference links that allow us to go to a couple of reference views, independently of where we are in the navigation flow. That is, we want a menu.
An option would be to replicate a list of links in every single page we have (the four of them so far). But we know this is not very good practice. Fortunately, the Django template language allows us to extend templates. Thanks to this, we can define a base template containing the menu and the main page structure, and then make each of our four views extend this base template.
This is how the reviews/templates/reviews/base.html template will look like.
<div>
<nav>
<div>
<a href="{% url 'reviews:review_list' %}">Winerama</a>
</div>
<div id="navbar">
<ul>
<li><a href="{% url 'reviews:wine_list' %}">Wine List</a></li>
<li><a href="{% url 'reviews:review_list' %}">Home</a></li>
</ul>
</div>
</nav>
<h1>{% block title %}(no title){% endblock %}</h1>
{% block content %}(no content){% endblock %}
</div>
There are three main sections in this base template. First we define a navitation bar between <nav> tags. There we have three links. the first one is the branding section that also acts as a home link. the other two are the menu elements. One of them gets the user to the wine list, and the other one is again the home link. The home page is the latest reviews view, as defined in our urls file.
The other two sections define the strcuture for all our views. They are compones by a title block and a content block. We use the Django template language {% block ... %} directive for that purpose.
Then we have to make each of the HTML templates we create for our views to do extend this base template. For example, the review_list.html template will look as follows.
{% extends 'reviews/base.html' %}
{% block title %}
<h2>Latest reviews</h2>
{% endblock %}
{% block content %}
{% if latest_review_list %}
<div>
{% for review in latest_review_list %}
<div>
<h4><a href="{% url 'reviews:review_detail' review.id %}">
{{ review.wine.name }}
</a></h4>
<h6>rated {{ review.rating }} of 5 by {{ review.user_name }}</h6>
<p>{{ review.comment }}</p>
</div>
{% endfor %}
</div>
{% else %}
<p>No reviews are available.</p>
{% endif %}
{% endblock %}
The first line is always a {% extends 'reviews/base.html' %} directive that specifies that our template extends the base template. By doing this, our review_list.html will automatically include the navigation menu. Then we define the content of the two blocks we defined in the base template. Django will replace the base blocks with these we define here.
Check the rest of the views in the tutorial repo. This point of the project corresponds to the git tag stage-0.5.
Styling with Bootstrap
Bootstrap is a popular HTML, CSS, and JS framework for developing responsive, mobile first projects on the web. Our web opages look rather dull and visually disorganised. We will use Bootstrap to make them provide a better user experience.
The easiest (and cleanest) way to use Bootstrap for a Django project is to install and use the Bootstrap 3 for Django app. Installation is as easy as going to our Koding terminal and running, using the Anaconda pip command, the following:
pip install django-bootstrap3
Once we do that, we can change our Winerama project settings to include the Bootstrap 3 for Django apps by defining the INSTALLED_APPS list as follows.
INSTALLED_APPS = (
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'bootstrap3',
'reviews',
)
Now we can use Bootstrap in our templates. First we add it to our base.html template to leave it like the following.
{% load bootstrap3 %}
{% bootstrap_css %}
{% bootstrap_javascript %}
{% block bootstrap3_content %}
<div class="container">
<nav class="navbar navbar-default">
<div class="navbar-header">
<a class="navbar-brand" href="{% url 'reviews:review_list' %}">Winerama</a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li><a href="{% url 'reviews:wine_list' %}">Wine list</a></li>
<li><a href="{% url 'reviews:review_list' %}">Home</a></li>
</ul>
</div>
</nav>
<h1>{% block title %}(no title){% endblock %}</h1>
{% bootstrap_messages %}
{% block content %}(no content){% endblock %}
</div>
{% endblock %}
We have modified the base template file following the Bootstrap 3 for Django documentation. Basically, we import some libraries and then assign some classes to different HTML elements. Nothing special here.
The rest of the templates have little modification. We just add some Bootstrap classes where needed. The only one that has major modifications is the wine_detail.html template. It will look as follows:
{% extends 'reviews/base.html' %}
{% load bootstrap3 %}
{% block title %}
<h2>{{ wine.name }}</h2>
<h5>{{ wine.review_set.count }} reviews ({{ wine.average_rating | floatformat }} average rating)</h5>
{% endblock %}
{% block content %}
<h3>Recent reviews</h3>
{% if wine.review_set.all %}
<div class="row">
{% for review in wine.review_set.all %}
<div class="col-xs-6 col-lg-4">
<em>{{ review.comment }}</em>
<h6>Rated {{ review.rating }} of 5 by {{ review.user_name }}</h6>
<h5><a href="{% url 'reviews:review_detail' review.id %}">
Read more
</a></h5>
</div>
{% endfor %}
</div>
{% else %}
<p>No reviews for this wine yet</p>
{% endif %}
<h3>Add your review</h3>
{% if error_message %}<p><strong>{{ error_message }}</strong></p>{% endif %}
<form action="{% url 'reviews:add_review' wine.id %}" method="post" class="form">
{% csrf_token %}
{% bootstrap_form form layout='inline' %}
{% buttons %}
<button type="submit" class="btn btn-primary">
{% bootstrap_icon "star" %} Add
</button>
{% endbuttons %}
</form>
{% endblock %}
That results in the following web page.
See how first, we need to import {% load bootstrap3 %} in order to use Bootstrap 3 for Django tags (like the one we use in the form). Apart from classes added to divs, we have used the {% bootstrap_form form layout='inline' %} directive in order to render the form ala Bootstrap, including a button with a starred icon.
This point of the project, the last stage for this part of the tutorial, corresponds to the git tag stage-1.
Conclusions and Future Works
In this tutorial we have explained how to set up Koding as a Django/Pandas development server. Then we have started a Django project and a Django app for our Wine recommender web application. We have added some model entities, and explained how to create views and forms for them. We have closed this first part of the tutorial by adding some style by using Bootstrap.
In the next part of the tutorial, we will add user management and, once we have users identified, proceed to generate user recommendations using machine learning.
The whole tutorial can be followed by checking out individual tags in our GitHub repo. By doing so you can work in those individual tasks at a given stage that you find more interesting or difficult. And usually will have deployed the latest stage of the app running at Koding.
Building Data Products with Python: Adding User Management to a Django website
This is the second tutorial on our series on how to build data products with Python. Remember that as a leitmotif we want to build a web-based wine reviews and recommendations website using Python technologies such as Django and Pandas. We have chosen to build a wine reviews and recommendations website, but the concepts and the technology stack can be applied to any user reviews and recommendation product.
We want these tutorials to leave you with a product that you can adapt and show as part of your portfolio. With this goal in mind, we will explain how to set up a Koding virtual machine and use it as a Django and Pandas + Scikit-learn Python development server.
In the first tutorial, we started a Django project and a Django app for our Wine recommender web application. The whole thing will be an incremental process that can be followed by checking out individual tags in our GitHub repo. By doing so you can work in those individual tasks at a given stage that you find more interesting or difficult.
In this second tutorial, we will add user management. This is an important part. Once we are able to identify individual users, we will be ready to generate user recommendations through machine learning.
Remember that you can follow the tutorial at any development stage by forking the repo into your own GitHub account, and then cloning it into your workspace and checking out the appropriate tag. By forking the repo you are free to change it as you like and experiment with it as much as you need. If at any point you feel like having a little bit of help with some step of the tutorial, or with making it your own, we can have a 1:1 codementor session about it.
So let’s continue with our project!
Configuring Django Authentication
From the very moment we created our project using django-admin startproject, all the modules for user authentication were activated. These consist of two items listed in our INSTALLED_APPS in settings.py:
django.contrib.authcontains the core of the authentication framework, and its default models.django.contrib.contenttypesis the Django content type system, which allows permissions to be associated with models you create.
and these items in your MIDDLEWARE_CLASSES setting:
SessionMiddlewaremanages sessions across requests.AuthenticationMiddlewareassociates users with requests using sessions.SessionAuthenticationMiddlewarelogs users out of their other sessions after a password change.
With these settings in place, when we ran the command manage.py migrate we already created the necessary database tables for authentication related models and permissions for any models defined in our installed apps. In fact, we can see them in the admin site, in the Users section.
But we want to do at least two things. First we want to require authentication for some of the actions in our web app (e.g. when adding a new wine review). Second we want users to be able to sign up and sign in using our web app (and not through the admin site).
Limiting access to logged-in users
By now, let’s accept that we can just create users by using the admin interface. Go there and create a user that we will use in this section. If we have been using the admin interface recently, we will probably continue to be logged in as the admin. That’s ok for now.
The next thing we want to do is to limit the access to our add_review view so only logged-in users can use it.
The clean and elegant way of limiting access to views is by using the @login_required annotation. Modify the add_review function in reviews/views.py so it looks like this:
@login_required
def add_review(request, wine_id):
wine = get_object_or_404(Wine, pk=wine_id)
form = ReviewForm(request.POST)
if form.is_valid():
rating = form.cleaned_data['rating']
comment = form.cleaned_data['comment']
user_name = form.cleaned_data['user_name']
user_name = request.user.username
review = Review()
review.wine = wine
review.user_name = user_name
review.rating = rating
review.comment = comment
review.pub_date = datetime.datetime.now()
review.save()
# Always return an HttpResponseRedirect after successfully dealing
# with POST data. This prevents data from being posted twice if a
# user hits the Back button.
return HttpResponseRedirect(reverse('reviews:wine_detail', args=(wine.id,)))
return render(request, 'reviews/wine_detail.html', {'wine': wine, 'form': form})
We have done two modifications. The first one is to add the @login_required annotation. By doing so we allow access to this view function just to logged in users. The second is to use request.user.username as the user name for our reviews. The request object has a reference to the active user, and this instance has a username field that we can use as needed.
Since we don’t need the user name field in the form anymore, we can change that form class in reviews/forms.py as follows.
class ReviewForm(ModelForm):
class Meta:
model = Review
fields = ['rating', 'comment']
widgets = {
'comment': Textarea(attrs={'cols': 40, 'rows': 15}),
}
If the user is not logged in, the user will be redirected to a login page. You can try this by logging out from the admin page and then attempting to add a wine review.
If you try that, you will see a Page not found (404) error since we did not define a URL mapping to the login page request and also did not define a template for it. Also notice that the URL you are redirected to includes a next=... param that will be the destination page after we login properly.
Login views
Django provides several views that you can use for handling login, logout, and password management. We will use them here. So first things first. Change the urlpatterns list in winerama/urls.py and leave it as follows.
urlpatterns = [
url(r'^reviews/', include('reviews.urls', namespace="reviews")),
url(r'^admin/', include(admin.site.urls)),
url('^accounts/', include('django.contrib.auth.urls'))
]
We just imported all the mappings from django.contrib.auth.urls. Now we need templates for different user management web pages. They need to be placed in templates/registration in the root folder for our Django project.
For example, create there a login.html template with the following code:
{% extends 'base.html' %}
{% load bootstrap3 %}
{% block title %}
<h2>Login</h2>
{% endblock %}
{% block content %}
<form action="{% url 'auth:login' %}" method="post" class="form">
{% csrf_token %}
{% bootstrap_form form layout='inline' %}
{% buttons %}
<button type="submit" class="btn btn-primary">
{% bootstrap_icon "user" %} Login
</button>
{% endbuttons %}
</form>
{% endblock %}
In order for our templates to be available, we need to change the TEMPLATES list in winerama/settings.py to include that folder.
TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [os.path.join(BASE_DIR, 'templates')],
'APP_DIRS': True,
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
]
We need to create templates for each user management view. In this section we will just provide two: templates/registration/login.html and ‘templates/registration/logged_out.html’. We will also move the reviews/templates/reviews/base.html template to the main templates/base.html folder so it can be used all across the project. Therefore we need to update all the template directives {% extend … %} that were making use of it.
Check the GitHub repo to see how the html templates need to look like.
Adding session controls
The next thing we need to do is to provide the login and logout buttons in our menu. Go to templates/base.html and modify the <nav> element in the template that contains the navigation menu so it looks like the following.
<nav class="navbar navbar-default">
<div class="navbar-header">
<a class="navbar-brand" href="{% url 'reviews:review_list' %}">Winerama</a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li><a href="{% url 'reviews:wine_list' %}">Wine list</a></li>
<li><a href="{% url 'reviews:review_list' %}">Home</a></li>
</ul>
<ul class="nav navbar-nav navbar-right">
{% if user.is_authenticated %}
<li><a href="{% url 'auth:logout' %}">Logout</a></li>
{% else %}
<li><a href="{% url 'auth:login' %}">Login</a></li>
{% endif %}
</ul>
</div>
</nav>
The important part here is how we make use of the context object user.is_authenticated within a {% if %} expression in order to show the right menu elements. When the user is logged in, we show the logout button and vice versa.
If you gave it a try, you probably noticed that, when logging in through the menu, we’d get a 404 error when we try to navigate to the user profile page. That’s fine. We haven’t provided a user profile page yet. We will solve this issue in the next section.
Again, this point of the project corresponds to the git tag stage-1.1.
User Profile Page
Actually our user profile will consist of a list of reviews by the logged in user. In order to accomplish that, we will need a few things:
- We need to define the default mapping for the landing page after login (when a
nextparam is not provided). - Then we need to define a mapping for the new view we are going to add.
- We need to define a view function that returns reviews given by a user.
- We need to define a template to render the result of the previous view.
- We need to create a menu item for this.
Let’s start by defining the default mapping. This is done at project configuration level. Go to winerama/settings.py and add the following line.
LOGIN_REDIRECT_URL = '/reviews/review/user'
Now we need to define the mappings in reviews/urls.py as follows.
urlpatterns = [
# ex: /
url(r'^$', views.review_list, name='review_list'),
# ex: /review/5/
url(r'^review/(?P<review_id>[0-9]+)/$', views.review_detail, name='review_detail'),
# ex: /wine/
url(r'^wine$', views.wine_list, name='wine_list'),
# ex: /wine/5/
url(r'^wine/(?P<wine_id>[0-9]+)/$', views.wine_detail, name='wine_detail'),
url(r'^wine/(?P<wine_id>[0-9]+)/add_review/$', views.add_review, name='add_review'),
# ex: /review/user - get reviews for the logged user
url(r'^review/user/(?P<username>\w+)/$', views.user_review_list, name='user_review_list'),
url(r'^review/user/$', views.user_review_list, name='user_review_list'),
]
We specify two mappings. One is used when a username is passed, and the other one when it is not. The new view needs a function in reviews/views.py, names user_review_list as defined in the url mapping. Add the following to your view file.
def user_review_list(request, username=None):
if not username:
username = request.user.username
latest_review_list = Review.objects.filter(user_name=username).order_by('-pub_date')
context = {'latest_review_list':latest_review_list, 'username':username}
return render(request, 'reviews/user_review_list.html', context)
As you see, we just added a filter to the code we used in the latest reviews list, and then we used a new template name user_review_list.html. We could have used the existing template for review_list.html, but we want to change the title to something more user-specific. Finally, we can decide whether or not to require users to login for this view. If not (like we did), user reviews are public, so users who are not logged in can view them as well.
Next, we need to create the template as follows.
{% extends 'reviews/review_list.html' %}
{% block title %}
<h2>Reviews by {{ user.username }}</h2>
{% endblock %}
That is, we extend the review_list.html template and just define the {% block title %} in order to replace the title with the one including the user name.
Finally, let’s add the menu item for the new view. We want a link that says Hello USER_NAME next to the logout menu item. Go and change the <nav> element in templates/base.html so it looks like the following.
<nav class="navbar navbar-default">
<div class="navbar-header">
<a class="navbar-brand" href="{% url 'reviews:review_list' %}">Winerama</a>
</div>
<div id="navbar" class="navbar-collapse collapse">
<ul class="nav navbar-nav">
<li><a href="{% url 'reviews:wine_list' %}">Wine list</a></li>
<li><a href="{% url 'reviews:review_list' %}">Home</a></li>
</ul>
<ul class="nav navbar-nav navbar-right">
{% if user.is_authenticated %}
<li><a href="{% url 'reviews:user_review_list' user.username %}">Hello {{ user.username }}</a></li>
<li><a href="{% url 'auth:logout' %}">Logout</a></li>
{% else %}
<li><a href="{% url 'auth:login' %}">Login</a></li>
{% endif %}
</ul>
</div>
</nav>
Finally, we want to be able to navigate to other users reviews pages. For example, when we see the name of a user under a wine review, we want to be able to click the name and go to that user reviews page. This means that we need to update review_list.html and review_detail.html templates and replace the user name text with a <a> element as follows (this is the reviews_detail.html template).
{% extends 'base.html' %}
{% block title %}
<h2><a href="{% url 'reviews:wine_detail' review.wine.id %}">{{ review.wine.name }}</a></h2>
{% endblock %}
{% block content %}
<h4>Rated {{ review.rating }} of 5 by <a href="{% url 'reviews:user_review_list' review.user_name %}" >{{ review.user_name }}</a></h4>
<p>{{ review.pub_date }}</p>
<p>{{ review.comment }}</p>
{% endblock %}
You can go to the stage-1.2 to see how these files look like after the updates.
And that’s it. We have created a proper user landing page. It is the same as the reviews list but filtered to include just this user reviews. This view can also be used to check a specific user reviews. This point of the project corresponds to the git tag stage-1.2.
Registration page
We already have the capability to create users, but only through the admin interface (or using code). What we want is for an unregistered user to be able to sign up and create their own user account. We could create forms and views to do this, but there is a very nice Django registration app that we can install and use for this.
Let’s start by installing the application package using our Anaconda pip as follows. Assuming we are at the folder containing the anaconda installation:
./anaconda/bin/pip install django-registration-redux
If everything goes well, we need to add the app to the INSTALLED_APPS list in winerama/settings.py as follows:
INSTALLED_APPS = (
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'bootstrap3',
'reviews',
'registration',
)
Add also the following values to the settings file.
ACCOUNT_ACTIVATION_DAYS = 7 # One-week activation window
REGISTRATION_AUTO_LOGIN = True # Automatically log the user in.
Once we have done this, we need to install the model used by the default setup. From the terminal at the project root, run the following.
python manage.py makemigrations
And then
pythonmanage.py migrate
The application includes different user management views, but we want to use just the registration ones. Set the following in the winerama/urls.py file.
urlpatterns = [
url(r'^reviews/', include('reviews.urls', namespace="reviews")),
url(r'^admin/', include(admin.site.urls)),
url(r'^accounts/', include('registration.backends.simple.urls')),
url(r'^accounts/', include('django.contrib.auth.urls', namespace="auth")),
]
We need to provide two templates that will replace the default ones. We want ours to be more in line with our site’s style. They are `templates/registration/registration_form.html’ and ‘templates/registration/registration_complete.html’. The first one looks like this.
{% extends 'base.html' %}
{% load bootstrap3 %}
{% block title %}
<h2>Register</h2>
{% endblock %}
{% block content %}
<form method="post" class="form">
{% csrf_token %}
{% bootstrap_form form layout='inline' %}
{% buttons %}
<button type="submit" class="btn btn-primary">
{% bootstrap_icon "user" %} Register
</button>
{% endbuttons %}
</form>
{% endblock %}
There we just follow the same structure that we used for the login template. Nothing special. The one for registration complete looks like this.
{% extends 'base.html' %}
{% load bootstrap3 %}
{% block title %}
<h2>Register</h2>
{% endblock %}
{% block content %}
Thanks for registering!
{% endblock %}
They have to be named that way and be located in the main templates/registration folder for django-registration to find them.
We have just used the simplest approach to building a user registration feature. If you are interested in a more complex (and production-ready) approach, for example sending activation/confirmation emails to the user or password recovery/reset views, have a look at the Django registration docs. The main issue with using them in this tutorial is that they involve using an email server and that’s not very related with what we want to teach here.
This point of the project corresponds to the git tag stage-2 of the project repo.
Conclusions
In this part of the tutorial, we have introduced users and user management into our Django app. By requiring users to register, we will be able to gather better user statistics, and this is a fundamental step into building a user-based recommender.
However our user management was very naive and simple. There are many issues we would need to tackle if we want to take this system into production, such as providing a two-step activation process for user accounts, checking whether an email has been previously used, or even allowing users to sign up with their social accounts.
But what we did so far is enough for our final goal, that is to show how a web site can include a recommender system and what is its workflow when gathering user data. Our ultimate goal is to build models to provide recommendations. This is going to be the purpose of the third and last part of our tutorial.
Remember that you can follow the tutorial at any development stage by forking the repo into your own GitHub account, and then cloning into your workspace and checking out the appropriate tag. By forking the repo you are free to change it as you like and experiment with it as much as you need. If at any point you feel like having a little bit of help with some step of the tutorial, or with making it your own, we can have a 1:1 Codementor session about it.
September 01, 2015
Invent with Python
Further Reading: Intermediate Python Resources
So after reading one of my Python books (available free online here and here), you're no longer a complete beginner and would like to know where to go next. It can be hard to find intermediate-level material: stuff that isn't for total beginners or advanced computer scientists. The topics that you should google for are Python standard library, Python object oriented programming, Python idioms, and popular Python modules.
For a more concrete list of resources, here's my list of recommendations.
Continuing with Python
- The Python Module of the Week Blog covers many of the modules in Python's standard library with practical examples. The Python standard library has a wide range of handy functions ("Guido's Time Machine" refers to how requests for features in Python would often be met by Guido van Rosum mentioning he had added it the night before.)
- Python Pocket Reference
is a short book intended for programmers who want to learn Python quickly. Now that you know basic programming concepts, this short book is a great way to fill out your Python knowledge and explore some more modules without spending a lot of time. - Python 3 Object-oriented Programming is a great resource to learn specifically about classes, objects, and other OOP concepts. My books skip OOP since it isn't necessary to get started coding, but once you've been programming for a while it's a must to become familiar with these topics.
- Data science and machine learning are hot topics in the job market. Data Science from Scratch and Programming Collective Intelligence are both great introductions to these topics.
- If you'd like to learn Python well enough to become a software engineer, Effective Python: 59 Specific Ways to Write Better Python provides a nice list of advanced (but effective) topics to read up on.
Practicing Your Code-Fu
- Project Euler
is a classic practice programming site, with mostly math-related problems that can be solved with code. - The dailyprogrammer subreddit has beginner, intermediate, and advanced programming problems post every day. (Reddit also has an FAQ with links to other resources.
- A list of 49 games to clone is a great source of ideas. The games were selected based on their simple mechanics and not requiring a lot of artwork or level designing. (You can read a free book on Pygame here.)
Moving On to Other Languages
Python is versatile and you can keep going down that path if you choose, but don't feel that you're somehow "not ready" to tackle a new language. If you do want to move on, here's some resources for the next step.
These languages are fairly similar to each other. Java is the more popular one and a mainstay of software engineering jobs. C# is (essentially) Microsoft's version of Java, meant to create Windows applications. I don't have any recommendations as far as C# books, but Java: A Beginner's Guide is a decent intro. There have been plenty of slight changes to the Java language over the years, so you don't want to get a book that's more than a decade old or so.





