Python

#slider
#adsense

Python - Yahoo News Search Results

Python's Jones slams finance in 'Boom Bust Boom' documentary

By Farah Nayeri LONDON (Reuters) - Terry Jones, a founding member of the Monty Python comedy team, targets the global financial system in a provocative new documentary and warns of more meltdowns if the system isn't fixed soon. Jones, co-writer and co-director of "Boom Bust Boom," uses puppetry and animation to get the message across -- as well as interviews with economist Paul Krugman, actor ...

Monty Python celebration at Tribeca fest

Monty Python - The Meaning of Life , the feature-length documentary that provides a look into the 2014 Python reunion shows in London, is set make its international debut at this yearâs Tribeca Film Festival, the Festival announced Wednesday. The Monty Python cast will be on hand...

Tribeca Film Festival marks 40th anniversary of Monty Python film

A new documentary about Monty Python will premiere at the Tribeca Film Festival in New York and members of the British comedy troupe will attend a special screening of the film "Monty Python and the Holy Grail" to mark its 40th anniversary. The feature-length documentary, "Monty Python - The Meaning of Live," will debut on April 25. The five surviving Pythons, John Cleese, Terry Gilliam, Eric ...

Tribeca: Monty Python 40th Anniversary Celebration Planned

The New York film festival will host a special screening of 'Monty Python and the Holy Grail,' with members of the comedy group in attendance; premiere the documentary 'Monty Python — The Meaning of Live'; and screen other classic Python films.

Walkers find huge python in canal

A dead python, measuring about 16ft (5m) is found dead in a Lancashire canal by walkers, prompting an appeal for information by police.

Delicious/tag/Python

recent bookmarks tagged Python

Extract text from any document; no muss, no fuss. | Datascope

Posted on 18 January 2038

github.com

Posted on 29 March 2015

github.com

Posted on 29 March 2015

ryanj/flask-hbase-todos · GitHub

Posted on 29 March 2015

MongoKit in Flask — Flask Documentation (0.10)

Posted on 29 March 2015

Arduino with Django and Python | k5's Blog

Posted on 29 March 2015

MySQL-python 1.2.3 for Windows and Python 2.7, 32bit and 64bit versions | codegood

Posted on 29 March 2015

Instalando o Django com suporte ao MySQL no Windows

Posted on 29 March 2015

jdunck/python-unicodecsv · GitHub

Posted on 29 March 2015

dragplus.com

Posted on 29 March 2015

Top Answers About Python (programming language) on Quora

Top Answers About Python (programming language)

Does Google hire programmers who use Python as their main programming language?


Can you get into Google as a smart programmer whose main language is Python? Yes. That said, you're going to want to learn C++ if you're planning for a career in Google, because it's the most respected of the "Google Languages" (Go was too new for me to evaluate in that regard, when I was there) and the machine learning projects are most likely going to be using it. Python's reputation is that it's not fit for production, and Java would make you a Java programmer. (Doing Java at Google-- at least, as of my experience which is 3 years dated, but I doubt much has changed-- means actually writing Java, not Scala or Clojure, so the cool kids write C++ and you should too.) The good news is that Google C++ is a lot more civilized than C++ in the wild. Google has some rotting projects but, averaged across the company, code quality is pretty damn high there. So the C++ that you encounter will probably be better than what you'd expect in a C++ shop.

See question on Quora

Posted on 10 February 2015

How do you identify a good programmer?


Many engineers think that the quality of a programmer, is their comprehension of some complex thing. Their mastery of best practice. Their dominance of killer methodologies....  These are what I call "inputs". These are things programmers put into a project.

Artists don't see the world like this. Artists tend to judge other artists by their body of work.   They make a judgement, not on methodology but on output alone. They look at the outcomes.

In my view, the artists have this right.

We should judge good programmers by the outcomes:
 
Did the project ship?
Did it ship on time?
Was the program stable? - and so on. Because this is what matters more than the other stuff.

I can hear the engineers howling in protest. What about the quality of code? What about <insert fashionable technique> methodology?  Reusability, Unit tests etc.

The thing is, good programmers will indeed use those methodologies. Just as good artists will employ the right sort of brush technique.
But the truth is, so do bad programmers and bad artists.

Good programmers ship.  Bad programmers congratulate themselves about clever methodologies, while missing deadlines.

#controversial

See question on Quora

Posted on 24 January 2015

Why are IT systems in big enterprises usually built using Java, instead of Python or JavaScript?


Questions like this are usually asked by those with limited experience in large systems development, often after having just learned programming or done some small project work in the first languages they've acquired knowledge in.

For starters, neither Python or JavaScript are modern languages compared to Java.  JavaScript is only a couple of years younger than Java and before that it was called LiveScript.  Python is OLDER than Java by several years.

"Modern" does not mean "better".

Talking to other systems is a function of protocols, not languages.  You need to understand that distinction.  As long as you can speak the protocol, like HTTP or SOAP, you choose the language best suited for the job.

Java is NOT legacy.  It is mainstream.  And it is mainstream for a reason.  Virtually all of the common enterprise tasks one might want to perform is available in Java either natively or through many frameworks. 

It can scale and has many years behind it in how to do this.  That is huge in large enterprises with high volume systems.  JavaScript libraries are playing catch-up here and it will likely 5-10 years before they work out all of the kinks in doing this well and the current framework wars that are being fought right now shake out winners and defacto standards emerge.

What about skillsets?  Too many people who prattle on about JavaScript and Python fail to consider that.  Companies need to hire people.  The number of skilled people determines how hard and how expensive that task is going to be.  Java skills are easier to find.  Paradoxically, they are also harder to find because the demand for skilled Java developers is so high that it is hard to find them because they don't move around much!

Java has had a long time to figure out what works well in a modern web development and service development context.  Python has a couple of well-known options like Django or TurboGears. 

JavaScript is all over the map right now playing the "let's reinvent Java because it is cool to do so" game.  JavaScript was never meant to be a back-end language and adapting it to that role is going to take time.  Hell, .NET developers are have been seeing the "new" MVC framework from Microsoft in the past couple years.  Java web developers are rolling on the floor howling with laughter at that one!  We were doing that 15 years ago and have used that time to refine what works and what doesn't!  Microsoft and the "modern" languages will be playing catch-up for sometime.

Managing large Java codebases is well known at this point.  It inherited that from its C ancestry.  Source control, build and deployment pipelines and most development methodologies like Agile have their roots in the Java space.

Legacy?  I sneer at that.  Give me a couple of good Java developers and I can probably wipe the floor with anyone using Python or JavaScript to roll out an idea and do it in the same amount of time and richness of capability.  With the advantage than when the idea needs to grow, I know my enterprise-grade solution will be able to scale because I'll have engineered it that way applying years of long experience.  Because I won't be reinventing the wheel when I crack out my JDBC frameworks, NoSQL frameworks, JSF2, Primefaces, Web Services components and the like.  You'll still be writing orchestration code or a front-end widget while I'm negotiating with my customers on what text their labels should have, hard work long behind me.

And I say this as some who is fluent in Python and JavaScript and use them in production systems.  And believe it or not, despite Java's strong typing, it is actually as dynamic in its behavior as Python or JavaScript are.  Java Reflection is the core of virtually every modern Java framework and we've been doing runtime inspection, runtime dependency injection and dynamic typing and runtime type determination for well over a decade.  Almost every piece of Java code I write, especially in a web application, is more-or-less a dynamically typed solution (i.e. JavaBean properties and dependency injected classes).

Languages are a means to an end.  They do not exist solely to justify themselves.  They are tools.  You're writing business solutions, not programming solutions. This is the first great epiphany you must come to understand.

See question on Quora

Posted on 18 January 2015

Why are IT systems in big enterprises usually built using Java, instead of Python or JavaScript?


At Spotify, we use Java extensively in the backend. This is not for legacy reasons, it's an active choice. We use Python too, but we have moved more and more to Java. The reason is that Java is much easier to get to perform well. Python is easy to write initially, but getting it to perform well when being hammered by 15 million paying users is another.

I personally don't understand how a medically sane person can like the Java syntax. However, no intelligent person can deny that the JVM is pretty darn good. It is fast, well-tested, well-documented and under active development. This cannot be said about many tools in software development.  

We used to have quite a bunch of C++ services, but while you can get C++ very fast too, it's harder to write, especially if you want the code to be maintainable. Java is a compromise that hits a sweet-spot for us.

Clojure is gaining tractions at Spotify, many new services are written in it, but it's not as wide-spread yet. While Clojure is certainly a better language, Java has the advantage of being non-weird. Java is an uncontroversial programming language that all experienced programmers can jump into with little effort, and that is a big advantage.


See question on Quora

Posted on 17 January 2015

Why are IT systems in big enterprises usually built using Java, instead of Python or JavaScript?


Python and JavaScript are qualitatively different languages from Java. And I would certainly argue against JavaScript being "more modern" than Java: they come from the same time.

Python and JavaScript are both interpreted languages (yes, I know Python complies to .pyc) whereas Java is a compiled language. Java runs under the JVM, which is a carefully designed secure environment; Python and JavaScript are deliberately designed as insecure "you can do anything" languages. Java has language constructs designed to make managing large projects with millions of lines of code possible; neither Python nor JavaScript do. My experience with Python is that it is not suitable for large projects.

If I wanted a "more modern" language than Java, I would probably go for something like Scala.

See question on Quora

Posted on 15 January 2015

How many boring steps in programming were there for you, before it became exciting?


I got stuck playing Grand Theft Auto: San Andreas on my PC! I had to figure out a way to get rid of the update to unlock my old saves. I ended up messing with one piece of the source code that was open to modify, it was written in C (If I only had known what C is).

I went to buy this book (This exact copy) for $75. In North Iraq $75 is a lot of money:

My cousin, Misho, who was an actual Engineer at the time told me few things:

  1. First, that is C++, you wanted C, those are not that close. AND no the ++ doesn't mean a better version. (My logic back then :/ )
  2. Second, this is for someone who can understand at least College level English, wait do you even know what a TextBook is Yad?
  3. Third, why did you break the game again? It doesn't look like that you are going to be playing it anytime soon.


Hence, the journey started my friend! I went on an epic mission to fix the game back! Didn't know what was coming, it got dark really fast.



I ended up making a MOD on the game and never being able to fix it. I downloaded the free available 3D car models and added them to the game.
My first 101 programming project as a 16 years old (It was one of the most exciting things I have ever done in my life).

When I showed the game Mod to my friends, the reaction was something like this:

For those who are interested here is what the game ended up like:
Hitman: Blood Money MOD in GTA :):

I found a Forum that gave all the Car Models for free and I gave them my Mod for free. Back then startup tactics was such simple!

https://www.youtube.com/watch?v=...

See question on Quora

Posted on 14 January 2015

Which language is best, C, C++, Python or Java?


If you are writing an operating system, I suggest you use C.
If you are writing a very complex application where execution speed is extremely important, I suggest you use C++.
If time to market is key, but execution speed is not important, I suggest you use python.
If your boss told you: "do it in Java or you are fired" I suggest you use Java and look for a better workplace.

See question on Quora

Posted on 7 January 2015

What is the best programming language to learn in 2014?


The answer depends on if you are looking for a job or a career.

If you just want a job, pickup any mobile development platform iPhone (Objective-C or Swift) or Android (Java).  The platform is more important that the language, as are the basic UI development skills.

If you want a career, become either a full-stack developer or a data scientist.  Either way, start by picking up a copy of Structure and Interpretation of Computer Programs and reading it cover to cover.  It's the only text book I ever read cover to cover (twice in fact).  It was published in 1985, but teaches the core concepts that once you grok you'll find picking up other languages to be fairly easy.

Mobile development is hot right now and probably will be so for a while, but even the iPhone will get replaced (or significantly rebuilt) at some point.  I find the hardest thing to find are full-stack developers for mobile.  Having the server-side skills is important for any real app today, and the server-side skills will probably outlast the client-side (UI) skills.  You'll find a lot more developers on the server-side at Google or Facebook.

Data is king these days and most developers don't know s..t about working with it.  They know how to store and retrieve data, but they don't know how to really extract information from it.  Being a data scientist is about a lot more than relational databases (the biggest data sets aren't relational anymore), it's about statistics, modeling and algorithmic manipulation of large data sets efficiently.  Data science is about answering questions with data, often times the hardest part is figuring out the questions that will be asked of a given data set and designing the applications that feed into that data set to collect the right data in the first place (I might be able to infer peoples ages given enough additional data about them, but it's a lot easier if I just have their data of birth).

See question on Quora

Posted on 4 January 2015

Do you really need to learn C to learn C++, Java and Python?


No, you don't. Many introductory programming courses are taught in Java or Python, and no knowledge of C is expected. My guess is that most practicing programmers in Java and Python would take quite some time to become productive in C. (Not sure about C++.)

That being said, if you learn C, you will learn some important low-level details about how, for example, data are stored in memory and how memory is managed. This may help you understand design decisions and performance characteristics associated with other languages.

See question on Quora

Posted on 3 January 2015

Why do we return 0 to the OS when we exit with no errors, but boolean functions within the code generally return 1 (true) to indicate all is fine?


Because the exit code is answering the question "were there any problems?" as opposed to "was the program successful?".

Moreover, it's actually more than just a boolean: the number returned is a code that can specify what sort of error it was. Depending on the program, an exit code of 1 can be very different from 255.

The neat thing is that this approach works as both a boolean and a richer code, at least in C. By answering what is, in essence, a negative question, it elegantly covers two different use cases at once.

See question on Quora

Posted on 11 December 2014

What should I learn C++ or Python?


You will hear a lot of suggestions to learn C and C++ first.  THose folks might be sincere, or they might just be trying to keep you out of the job market and competing with them.


Most schools have gone to teaching Java or Python, after a few horrible decades teaching C and C++.    C and C++ are like running chainsaws, and you don't hand a running chainsaw to a blindfolded toddler.

If you learn Python first, you will avoid thousands of gumption-traps in C and C++.   Random crashes.  Incomprehensible compile errors.  Undebuggable crashes on code that looks just perfect.  Pages of gibberish for error messages on your first attempt to use the std stuff.    Just horrible place to learn anything.

The only up-side is if you can learn C and C++ and write non-trivial working programs, that's QUITE an accomplishment.   Anything else will seem easy by comparison.

So, you decide, learn to swim in the shallow end of the pool, or the 5,000 foot deep end?     You decide.   No lifejackets either.

See question on Quora

Posted on 21 November 2014

Does Python have a future?


1
from __future__ import braces #Seriously, try it !


Python is one of the most widely adopted general-purpose modern scripting-languages. (Javascript is used more by pure volume, but is fairly rare for anything other than webpages) It seems a pretty safe bet that Python is here to stay over the next decade.

Longer term there's no way to know for sure with any language, but it's not really worth worrying about because IF Python is ever supplanted by a newer shinier language, it's almost guaranteed that most of the things you learn in Python will transfer easily to the new language.

When you're learning Python (or any language) the specifics of the language is not where most of your effort will be spent anyway. Instead you'll learn a lot of general principles, and those apply no matter what language you'll use for implementing them a few decades from now.

See question on Quora

Posted on 4 November 2014

Why is it easier to learn a programming language once you already know one?


Because
  1. Programming languages often address the same challenges, sometimes in similar ways. So, you know what to expect and can understand things in comparison.
  2. Programming languages share some of their core concepts and some fraction of their syntax.
  3. Your brain adapts to the process of learning a new language.
  4. The mechanics of using programming languages are not very language-dependent (at least for within the same language type), so you become more efficient in completing basic tasks, such as editing source code, and in debugging.


See question on Quora

Posted on 29 October 2014

As a starting Python programmer I see a lot of praise for the Python language (and so far I can only agree). Isn't there anything bad to say about it? What is a real con?


While the good outweigh the bad for most cases, there are definitely some bad parts about Python:

- It's slow. That is both dynamically typed and interpreted means that performance takes a hit.

- The Global Interpreter Lock (GIL) makes it hard to do advanced operations with asynchronous programming.

- `print()` doesn't require parentheses, which is inconsistent with the rest of the language.

- Though everything is an object, there are a number of builtin functions which make the language inconsistent. For example `[1, 2, 3].len` would be more consistent than `len([1, 2, 3])`. Ruby is not inconsistent in this way.

- {'a': 1, 'b': 2} is how you define a dictionary. {1, 2} is how you define a set. What does {} mean? (A dictionary. Dictionary notation came first.)

- (1, 2,) defines a tuple. (1, 2) defines a tuple. (1,) defines a tuple. (1) is just 1. (,) is a syntax error. () defines a tuple.

-  ({} == []) != (bool({}) == bool([]))

- Since False == 0 and since Python is dynamically typed, you can do nonsensical operations like [1, 2, 3] * False (which equals []). In a sane typed language, that would probably throw an error.

- There is no good way to add infix operators.

- Python sets hard limits on the stack height, which can be a problem if you are doing anything recursive in nature. Technically, you can rewrite recursive algorithms in an iterative form to get around this issue, but this is impractical for complex functions like the Ackermann function.

- Guido dislikes reduce (a higher level function) and its use is discouraged. Unfortunately, this means that anytime you have a function which could benefit from being abstracted, you cannot abstract it. This is basically a forced design pattern.

- There is no way to express a `do-while` statement, which leads to design patterns.

- Python's packaging system is extremely complicated. For starters, you need to install pip yourself using easy_install. Distribution is simple enough to get started with, but if you need to do anything complex you will need to study the history of the various options (distutils, setuptools, etc).

- Certain libraries in the standard library are showing their age, but there is nothing in the documentation that makes that clear. For example, imaplib fails to intelligently parse responses in the IMAP protocol, meaning that you sometimes need to construct data structures from strings represented lists. (imaplib was written in 1997, before Python 1.5 was released.)

- List comprehensions leak scope. For example, `[x for x in xs]` will put `x` ins scope. This can be dangerous if you had previously defined `x`.

- Python relies heavily on idioms. To the master this is no problem, but to the novice you have to discover the idioms. For example, to repeat a block `n` times you want to know the idiom `for _ in range(n):`.

- Multi-line `if` statements are hard to read. Consider:

1
2
3
4
if (collResv.repeatability is None or 
    collResv.somethingElse):
    collResv.rejected = True
    collResv.rejectCompletely()


- Subclass relations aren't transitive. [1]

- When using byte strings and Unicode strings, Python does implicit conversions which can be confusing if you don't know what's happening. For example,

1
2
3
4
5
6
>>> "Hello " + u"World"
u'Hello World'
>>> "Hello\xff " + u"World"
Traceback (most recent call last):
    ...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 5: ordinal not in range(128)


Also:

1
2
3
>>> "foo" == u"foo" True
>>> "foo\xff" == u"foo\xff"
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal False


[1]: Python Subclass Relationships Aren't Transitive

See question on Quora

Posted on 19 October 2014

Why is tail recursion optimisation not implemented in languages like Python, Ruby, and Clojure? Is it just difficult or impossible?


There are different issues at play for the various languages.

JVM

For Clojure, the difficulty is with the JVM that does not support proper tail calls at a low level. It is the Java Virtual Machine, after all! People almost never use recursion in Java. I think Clojure could have proper tail calls if it used its own calling convention, but it wants to be easily compatible with existing Java code and so sticks with the standard Java calling convention.

Scala, also being on the JVM, has a very similar story to Clojure. The difference is that the Scala compiler is intelligent enough to optimize a directly recursive tail call into a loop. However, this still misses out on mutually recursive functions as well as other, potentially non-recursive uses for tail calls. This makes certain functional programming patterns, like continuation passing style and certain monads, much harder to use effectively.

There's been some thought towards supporting a
tailcall invoke
instruction in the JVM bytecode for full proper tail calls. This proposal would rectify the problem with implementing tail calls, but I'm not sure how likely it is to actually get implemented.

Besides technical issues on the JVM, people in the Clojure community also think that making tail calls explicit is important from a language design point of view. It's like an extra check to ensure that the code you expect to have a proper tail call actually does. Personally, this reasoning feels a bit like "sour grapes" to me, especially since their approach with
recur
doesn't scale to more advanced functional patterns, just like Scala.

Other

Rust also doesn't support proper tail calls because of its calling convention. Rust aims to be fully compatible with C, and the C calling convention was naturally not designed with tail calls in mind! Rust could not maintain tail calls, C compatibility and satisfactory performance at the same time, so tail calls had to go.

Another common reason to avoid tail calls is that they compromise stack traces. The whole point of a proper tail call is not to use the stack, so standard debugging tools that rely on stack frames won't work properly. This makes languages that heavily rely on stack traces for debugging leery of tail calls:
Some languages really like their stack traces...

This is a difficult problem. It's certainly possible to instrument tail calls in useful ways, but it's very hard to do that without sacrificing performance and without breaking compatibility with other languages and tools.

Stack traces are one of the reasons that Python does not support tail calls. However, the main reason is that Guido does not like recursion (!) and wants to encourage people not to use it. I don't even want to know what he thinks about continuation-passing style!

Some languages don't support tail calls just because they're hard to implement. I think this is the case for Ruby: it wants to support a bunch of different platforms like JRuby and so on, but tail calls on the JVM are difficult (as covered above). So instead, Ruby has tail call elimination on some implementations but not others, which means you can't really rely on it.

JavaScript doesn't support tail calls because it's a complete mess of a language with a dysfunctional standardization process. Happily, they're slated for ES6, so future versions will actually have them! It's already beating the rest of the languages I talked about, most of which should feel ashamed.

So: languages don't support tail calls for a bunch of reasons. Some of these are practical effects of legacy tools and software, others are products of short-term thinking. To a large extent, it's also because there has not been much pressure for it: the deviants that actually wanted tail calls could always go over to a real functional language like Scheme or ML or Haskell. Happily, this last one is changing as functional programming gains more and more market share.

See question on Quora

Posted on 22 June 2014

Will Python suffer the same fate as Perl?


Not yet.

I think it turns out that Python 3 was a bad move strategically. But it's not the disaster that Perl 6 was because it noticably "exists". Whereas Perl 6 was vapourware for a long time. And Python 2.7 and 3.x continue to develop similar libraries in parallel.

 Worse still for Perl 6. Its first implementation was written in Haskell, which got Perl programmers thinking about Haskell. After which there were fewer Perl programmers.

So I don't think that Python programmers are going to fall out through the gap between 2.x and 3.x.

Still, it's a regrettable confusion. I suspect Python will continue with people recognising that it comes in two different "dialects" much as people accepted that there were different dialects of BASIC. And eventually one will just quietly die.

See question on Quora

Posted on 14 June 2014

What does one mean by 'elegant' code?


It's very closely related to elegance in mathematics.

Elegant code is simple, gives you some new insight and is generally composable and modular. These qualities, although they may look almost arbitrary, are actually deeply related, practically different facets of the same underlying idea.

Simplicity

The biggest one, perhaps, is simplicity. But remember: simple is not the same thing as easy. Just because some code is very simple does not mean it is easy for you to understand. Easiness is relative; simplicity is absolute. 

This is especially relevant for Haskell: often, the most elegant Haskell code comes from simplifying a problem down to a well-known, universal abstraction, often borrowed from math. If you're not familiar with the abstraction, you might not understand the code. It might take a while to get it. But it is still simple.

Simple code like this is also often concise, but this is a matter of correlation, not causation. It goes in one direction: most elegant code is concise, but much concise code is not elegant.

One way of thinking about simplicity is that there are fewer "moving parts", fewer places to make mistakes. This is why many of Haskell's abstractions are so valuable—they restrict what you can possibly do, precluding common errors and shrinking the search space.

Consider the difference between mapping over a list and using a for-loop: with the loop, you could mess up the indexing, have an off-by-one error or even be doing something completely different like iterating over multiple lists at once or just repeating something n times. With a map, there's only one possible thing you can be doing: transforming a list. Much simpler! It leaves you with fewer places to make a mistake and code that's easier to read at a glance, since you immediately know the "shape" of the code when you see
map
.

In fact, that's probably my favorite test for simplicity: given that I'm familiar with the relevant abstractions and idioms, how easy is the code to read at a glance? Code is read more often than it's written, but it's skimmed even more often than it's read. That makes the ability to quickly get the gist of an expression—without having to understand all the details—incredibly useful.

Insight

Another thing that elegant code does is give you a new insight on its domain.

Sometimes, this is a surprising connection between two things that seemed disparate. Sometimes it's a new way of thinking about the problem. Sometimes its a neat idiom that captures a pattern that is normally awkward. Almost always, it's an idea that you can apply to other code or a common pattern you've already seen elsewhere.

Beyond the immediately practical reasons, mostly illustrated in the "simplicity" section, this is why I'm so drawn to elegant code:  it's the best way to learn new things. And these things, thanks to their simplicity and generality, tend to be pretty deep. Not just pointless details.

Elegant code also displays the essence of the problem its solving. It's a clear reflection of the deeper structure underlying either the solution or the problem space, not just something that happened to work. If your problem has some sort of symmetry, for example, elegant code will somehow show or take advantage of it. This is why that QuickSort example—which, unfortunately, has some problems of its own—gets trotted out so often. It does a marvellous job of reflecting the structure, and especially the symmetry, of QuickSort which the imperative version largely obscures in implementation detail. The key line
quicksort 
lesser 
++ 
[
p
] 
++ 
quicksort greater
reflects the shape of the resulting list.

Composability

The final characteristic of elegant code, especially elegant functional code, is composability and modularity. It does a great job of finding the natural stress lines in a problem and breaking it into multiple pieces. In some ways, this is just the same point all over: elegant code gets at the structure of what it's doing.

Really elegant code combines this with giving you a new insight and letting you split a problem into two parts that you thought inseparable. This is where laziness really shines, coincidentally.

A great such example is splitting certain algorithms into two phases: constructing a large data structure and then collapsing it. Just think of heapsort: build a heap then read elements out of it. That particular algorithm is elegant on its own, and is pretty easy to implement directly in two parts. For many other algorithms, the only way to separate them and maintain the same asymptotic bounds is to construct and fold the data structure lazily.

Conal Elliott has a great talk about this which is well worth a look. It includes some specific examples of splitting up algorithms that seem inseparable into a fold and an unfold—most of which only work lazily.

I think modularity is one of the best ways to avoid bugs and, to illustrate, I'm just going to reuse the same pictures. The first represents code that's less modular; the second represents code that's more modular. You can see why I'd find the second one more elegant!

Imagine these graphs to be parts of your code with actual, or potential, interconnections between them. If all your code is in one big ball, then every part could potentially depend on every other part; if you manage to split it into two modules with clear module boundaries, the total number of possible interconnections goes way down.
Not very modular, pretty complex—not very elegant.

Simpler and more elegant.

An Example

But that was all pretty abstract. So let me give you an example that captures all of these ideas and neatly illustrates elegance.

Lets say we have a bunch of records containing book metadata:
1
2
3
4
data Book = { author, title :: String
            , date :: Date
            {- ... -}
            }


We want to sort our book collection, first by author, then by title, then by date. Here's the really elegant way to do it:
1
sortBy (comparing author <> comparing title <> comparing date)

We can use
comparing
to turn each field into a comparison function of type
Book -> Book -> Ordering
and then use the monoid operator
<>
to combine these comparison functions.

It does exactly what you expect it to—but if you're not familiar with monoids and the
Ordering
type, you might not know why it does what you expect.

On the other hand, there is the really explicit version which replaces each
<>
with pattern-matching on
EQ
,
LT
and
GT
. To somebody who's not familiar with the relevant abstractions, this might be easier to read—but it's also more complex and noisy. Less elegant.

This example is simple because it neatly abstracts over all the plumbing needed to combine the comparison functions. It's very easy to tell, at a glance, exactly which fields we're sorting by and with what priorities.

It's insightful because it takes advantage of the natural way to combine
Ordering
values—the way they form a monoid. Moreover, going from the
Ordering
monoid to the
Book -> Book -> Ordering
monoid is actually also free—if we know how to combine any type
o
, we know how to combine functions
a -> o
. So the abstraction that hid the plumbing? We got most of that for free, from libraries that are not specific to Ordering at all!

Finally, this version is definitely more modular and composable than the alternatives. It's very easy to mix and match different comparison functions with this pattern. We can trivially extract parts of them to be their own functions. It's very easy to refactor. All good things.

Hopefully that's a nice illustration of what people mean by elegant and why it comes up often in languages like Haskell.

See question on Quora

Posted on 25 May 2014

Do I count as spoiled if I'm starting to find Python ugly?


Not at all. Python is very popular, but it isn't a particularly well designed language. Honestly, it's woefully overrated. Depending on who's using it, it got popular because, for high-level tasks, it's better than Java or better than Perl or even just better than C++: not exactly a high bar to clear.

Jesse Tov wrote a great post about Python's design issues, which is well worth a read: Jesse Tov's answer to What are the main weaknesses of Python as a programming language?.

See question on Quora

Posted on 3 May 2014

How hard is it to learn Java if I already know how to program in Python?


Java in my opinion has a more explicit and easier to learn syntax.  No slices.  No implicit looping.  Not many operators at all.  Anonymous classes (similar to lambda functions) are maybe the most obscure syntactical feature, but it's not hard to figure those out.  For the most part it's just scalars, objects, and methods.

I would definitely say its easier to learn Java than Python.

See question on Quora

Posted on 16 April 2014

How do I become a data scientist?


Here are some amazing and completely free resources online that you can use to teach yourself data science.


Besides this page, I would highly recommend the Quora Data Science FAQ as your comprehensive guide to data science! It includes resources similar to this one, as well as advice on preparing for data science interviews. Additionally, follow the Quora Data Science topic if you haven't already to get updates on new questions and answers!

Fulfill your prerequisites


Before you begin, you need Multivariable Calculus, Linear Algebra, and Python.

If your math background is up to multivariable calculus and linear algebra, you'll  have enough background to understand almost all of the probability / statistics / machine learning for the job.

Multivariate Calculus: https://www.quora.com/What-are-the-best-resources-for-mastering-multivariable-calculus
Numerical Linear Algebra / Computational Linear Algebra / Matrix Algebra: Linear Algebra, Coursera (starts 2/2/2015)

Multivariate calculus is useful for some parts of machine learning and a lot of probability. Linear / Matrix algebra is absolutely necessary for a lot of concepts in machine learning.

You also need some programming background to begin, preferably in Python. Most other things on this guide can be learned on the job (like random forests, pandas, A/B testing), but you can't get away without knowing how to program!

Python is the most important language for a data scientist to learn. Check out
For some reasoning behind that.

To learn Python, check out How do I learn Python?
For general advice on learning how to program, check out How do I learn to code?

If you're currently in school, take statistics and computer science classes. Check out What classes should I take if I want to become a data scientist?


Plug Yourself Into the Community


Check out Meetup to find some that interest you! Attend an interesting talk, learn about data science live, and meet data scientists and other aspirational data scientists!

Start reading data science blogs and following influential data scientists!

Setup your tools

  • Install Python, iPython, and related libraries (guide)
  • Install R and RStudio (I would say that R is the second most important language. It's good to know both Python and R)
  • Install Sublime Text

Learn to use your tools


Learn Probability and Statistics


Be sure to go through a course that involves heavy application in R or Python.


Complete Harvard's Data Science Course


This course is developed in part by a fellow Quora user, Professor Joe Blitzstein. Note that I recommend completing the 2013 version of the class instead of the 2014 version.

Intro to the class

Lectures and Slides


Assignments


Labs


Do most of Kaggle's Getting Started and Playground Competitions


I would NOT recommend doing any of the prize-money competitions. They usually have datasets that are too large, complicated, or annoying, and are not good for learning (Kaggle.com)

Start by learning scikit-learn, playing around, reading through tutorials and forums at Data Science London + Scikit-learn for a simple, synthetic, binary classification task.

Next, play around some more and check out the tutorials for Titanic: Machine Learning from Disaster with a slightly more complicated binary classification task (with categorical variables, missing values, etc.)

Afterwards, try some multi-class classification with Forest Cover Type Prediction.

Now, try a regression task Bike Sharing Demand that involves incorporating timestamps.

Try out some natural language processing with Sentiment Analysis on Movie Reviews

Finally, try out any of the other knowledge-based competitions that interest you!

Learn Some Data Science Electives



Do a Capstone Product / Side Project


Use your new data science and software engineering skills to build something that will make other people say wow! This can be a website, new way of looking at a dataset, cool visualization, or anything!


Code in Public


Create public github respositories, make a blog, and post your work, side projects, Kaggle solutions, insights, and thoughts! This helps you gain visibility, build a portfolio for your resume, and connect with other people working on the same tasks.

Get a Data Science Internship or Job



Check out What is the Data Science topic FAQ? for more discussion on internships, jobs, and data science interview processes!



Book Recommendations


These three books are available as free pdfs at:


Check out more specific versions of this question:


Think like a Data Scientist


In addition to the concrete steps I listed above to develop the skillset of a data scientist, I include seven challenges below so you can learn to think like a data scientist and develop the right attitude to become one.

(1) Satiate your curiosity through data


As a data scientist you write your own questions and answers. Data scientists are naturally curious about the data that they're looking at, and are creative with ways to approach and solve whatever problem needs to be solved.

Much of data science is not the analysis itself, but discovering an interesting question and figuring out how to answer it.

Here are two great examples:

Challenge: Think of a problem or topic you're interested in and answer it with data!


(2) Read news with a skeptical eye


Much of the contribution of a data scientist (and why it's really hard to replace a data scientist with a machine), is that a data scientist will tell you what's important and what's spurious. This persistent skepticism is healthy in all sciences, and is especially necessarily in a fast-paced environment where it's too easy to let a spurious result be misinterpreted.

You can adopt this mindset yourself by reading news with a critical eye. Many news articles have inherently flawed main premises. Try these two articles. Sample answers are available in the comments.

Easier: You Love Your iPhone. Literally.
Harder: Who predicted Russia’s military intervention?

Challenge: Do this every day when you encounter a news article. Comment on the article and point out the flaws.


(3) See data as a tool to improve consumer products


Visit a consumer internet product (probably that you know doesn't do extensive A/B testing already), and then think about their main funnel. Do they have a checkout funnel? Do they have a signup funnel? Do they have a virility mechanism? Do they have an engagement funnel?

Go through the funnel multiple times and hypothesize about different ways it could do better to increase a core metric (conversion rate, shares, signups, etc.). Design an experiment to verify if your suggested change can actually change the core metric.

Challenge: Share it with the feedback email for the consumer internet site!

(4) Think like a Bayesian


To think like a Bayesian, avoid the Base rate fallacy. This means to form new beliefs you must incorporate both newly observed information AND prior information formed through intuition and experience.

Checking your dashboard, user engagement numbers are significantly down today. Which of the following is most likely?

1. Users are suddenly less engaged
2. Feature of site broke
3. Logging feature broke

Even though explanation #1 completely explains the drop, #2 and #3 should be more likely because they have a much higher prior probability.

You're in senior management at Tesla, and five of Tesla's Model S's have caught fire in the last five months. Which is more likely?

1. Manufacturing quality has decreased and Teslas should now be deemed unsafe.
2. Safety has not changed and fires in Tesla Model S's are still much rarer than their counterparts in gasoline cars.

While #1 is an easy explanation (and great for media coverage), your prior should be strong on #2 because of your regular quality testing. However, you should still be seeking information that can update your beliefs on #1 versus #2 (and still find ways to improve safety). Question for thought: what information should you seek?

Challenge: Identify the last time you committed the Base rate fallacy. Avoid committing the fallacy from now on.

(5) Know the limitations of your tools


“Knowledge is knowing that a tomato is a fruit, wisdom is not putting it in a fruit salad.” - Miles Kington

Knowledge is knowing how to perform a ordinary linear regression, wisdom is realizing how rare it applies cleanly in practice.

Knowledge is knowing five different variations of K-means clustering, wisdom is realizing how rarely actual data can be cleanly clustered, and how poorly K-means clustering can work with too many features.

Knowledge is knowing a vast range of sophisticated techniques, but wisdom is being able to choose the one that will provide the most amount of impact for the company in a reasonable amount of time.

You may develop a vast range of tools while you go through your Coursera or EdX courses, but your toolbox is not useful until you know which tools to use.

Challenge: Apply several tools to a real dataset and discover the tradeoffs and limitations of each tools. Which tools worked best, and can you figure out why?

(6) Teach a complicated concept


How does Richard Feynman distinguish which concepts he understands and which concepts he doesn't?

Feynman was a truly great teacher. He prided himself on being able to devise ways to explain even the most profound ideas to beginning students. Once, I said to him, "Dick, explain to me, so that I can understand it, why spin one-half particles obey Fermi-Dirac statistics." Sizing up his audience perfectly, Feynman said, "I'll prepare a freshman lecture on it." But he came back a few days later to say, "I couldn't do it. I couldn't reduce it to the freshman level. That means we don't really understand it." - David L. Goodstein, Feynman's Lost Lecture: The Motion of Planets Around the Sun

What distinguished Richard Feynman was his ability to distill complex concepts into comprehendible ideas. Similarly, what distinguishes top data scientists is their ability to cogently share their ideas and explain their analyses.

Check out Edwin Chen's answers to these questions for examples of cogently-explained technical concepts:

Challenge: Teach a technical concept to a friend or on a public forum, like Quora or YouTube.

(7) Convince others about what's important


Perhaps even more important than a data scientist's ability to explain their analysis is their ability to communicate the value and potential impact of the actionable insights.

Certain tasks of data science will be commoditized as data science tools become better and better. New tools will make obsolete certain tasks such as writing dashboards, unnecessary data wrangling, and even specific kinds of predictive modeling.

However, the need for a data scientist to extract out and communicate what's important will never be made obsolete. With increasing amounts of data and potential insights, companies will always need data scientists (or people in data science-like roles), to triage all that can be done and prioritize tasks based on impact.

The data scientist's role in the company is the serve as the ambassador between the data and the company. The success of a data scientist is measured by how well he/she can tell a story and make an impact. Every other skill is amplified by this ability.

Challenge: Tell a story with statistics. Communicate the important findings in a dataset. Make a convincing presentation that your audience cares about.



If you liked this answer, please consider:

  1. Clicking "Want Answers" to What is the Data Science topic FAQ? and this question to get notifications of updates!
  2. Following me (William Chen) and my Quora blog at Storytelling with Statistics to get notified when I post more content like this!
  3. Sharing this page with your friends and followers via facebook / twitter / linkedin / g+ etc.!


See question on Quora

Posted on 18 March 2014

What are some common mistakes that could slow down one's Python scripts?


Using loops instead of list comprehensions.

Quick example:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def forloop():
    L = []
    for i in xrange(100):
        L.append(i**2)

def listcomp():
    L = [i**2 for i in xrange(100)]

if __name__ == '__main__':
    import timeit
    print 'for-loop =', timeit.timeit("forloop()", setup="from __main__ import forloop")
    print 'list comp =', timeit.timeit("listcomp()", setup="from __main__ import listcomp")


Outputs:
1
2
for-loop = 10.033878088
list comp = 6.61429381371


See comments for details on e.g.
map
.

See question on Quora

Posted on 27 February 2014

Can a high-level language like Python be compiled thereby making it as fast as C?


Yes it can. In fact, many high-level languages are compiled like that including Common Lisp, Scheme, OCaml and Haskell.

But you have to keep something in mind: C is not all that fast. Rather, C is easy to optimize.

This is an important difference: if you just write naïve C code, it won't be fast. It won't be terribly slow--certainly not as slow as Python--but it won't be anywhere close to the speed of optimized C.

C doesn't magically make your code fast. Rather, C exposes enough low-level details to make optimizing possible. It takes an expert in performance--one who is constantly thinking about cache behavior, register blocking, memory layout and so on--to write truly fast C code. And C doesn't even help all that much; it just makes all this possible in the first place.

For example, you could just compile your high-level program to C directly. But just because you're outputting C does not mean you're anywhere near the speed C can offer. And, in fact, this is exactly what happens with compilers like CHICKEN Scheme: they turn high-level code into C, but the result isn't nearly as good as handwritten C can be.

To actually rival C, your compiler would have to not just compile down to assembly but also optimize really cleverly. You would have to compete with both the optimizations C compilers already perform and the hand-optimization of experts. And, right now, we don't have any systems that can really do this in the general case.

There have been research projects like Stalin Scheme (it brutally optimizes) which could beat even hand-written C in some cases. But this comes with significant compiler complexity, really long compile times and prevents separate compilation--enough problems to basically kill Stalin. There have also been projects that can generate really fast code for specific tasks or really short programs. But nothing general.

So: yes, you can compile high-level languages. And, in one sense, they would be as fast as C. But they will still not be as optimizable as C, so hand-written C will still trounce your high-level programs.

See question on Quora

Posted on 2 January 2014

Why did Google move from Python to C++ for use in its crawler?


To add to the good answers above: my starter project was to make the crawler follow particular kinds of redirects, and since the Python crawler was still in use but the C++ ("google2") crawler was soon to launch, I got to implement the change in both. I saw several issues.

1. Neither crawler had unit tests, and their "system tests" were minimal at best, absent at worst. For the C++ crawler, that led to occasional subtle bugs like failing to respect robots.txt. For the Python crawler -- Python not having a static type checker -- that led to dumb crashes all over the place, as a new code path was executed and some simple error manifested itself. The Python philosophy seems to be: "As long as your unit tests cover every possible code path, who needs a static type checker?"

2. Python (1.2 IIRC) would occasionally just core dump while running the crawler. It was completely stock, no C++ modules compiled in or dynamically linked, just bog standard. No matter how much you hate debugging your C++ core dump, you'll hate debugging a Python interpreter core dump way more.

3. The crawler's basic flow was to take the next best URL to crawl, issue network requests to receive the page, parse the page to find outbound links, send the page to storage, and recalculate PageRank given the new outbound links. The parser and pagerank calculation were dramatically faster at not a lot of growth in lines of code.

4. If Jeff Dean and Sanjay Ghemawat are writing some of your basic code, as they were, and if they ptrace your crawler and find that various expensive disk flushes are happening and it's not clear why, because none of the code you've written seems to be flushing,don't argue with them, and do let them use a language that exposes exactly which expensive syscalls are happening when.

5. That said, there was originally some controversy about the switch. However, when the new system was turned on and used fewer machines to crawl 5x faster with higher reliability, the practical question was settled.

See question on Quora

Posted on 20 November 2013

How do I become a data scientist?


Become a Data Scientist by Doing Data Science


The best way to become a data scientist is to learn - and do - data science. There are a many excellent courses and tools available online that can help you get there.

Here is an incredible list of resources compiled by Jonathan Dinu, Co-founder of Zipfian Academy, which trains data scientists and data engineers in San Francisco via immersive programs, fellowships, and workshops.

EDIT: I've had several requests for a permalink to this answer. See here: A Practical Intro to Data Science from Zipfian Academy

EDIT2: See also: "How to Become a Data Scientist" on SlideShare: http://www.slideshare.net/ryanor...

Environment
Python is a great programming language of choice for aspiring data scientists due to its general purpose applicability, a gentle (or firm) learning curve, and — perhaps the most compelling reason — the rich ecosystem of resources and libraries actively used by the scientific community.

Development
When learning a new language in a new domain, it helps immensely to have an interactive environment to explore and to receive immediate feedback. IPython provides an interactive REPL which also allows you to integrate a wide variety of frameworks (including R) into your Python programs.

STATISTICS
Data scientists are better at software engineering than statisticians and better at statistics than any software engineer. As such, statistical inference underpins much of the theory behind data analysis and a solid foundation of statistical methods and probability serves as a stepping stone into the world of data science.

Courses
edX: Introduction to Statistics: Descriptive Statistics: A basic introductory statistics course.

Coursera Statistics, Making Sense of Data: A applied Statistics course that teaches the complete pipeline of statistical analysis

MIT: Statistical Thinking and Data Analysis: Introduction to probability, sampling, regression, common distributions, and inference.

While R is the de facto standard for performing statistical analysis, it has quite a high learning curve and there are other areas of data science for which it is not well suited. To avoid learning a new language for a specific problem domain, we recommend trying to perform the exercises of these courses with Python and its numerous statistical libraries. You will find that much of the functionality of R can be replicated with NumPy, @SciPy, @Matplotlib, and @Python Data Analysis Library

Books
Well-written books can be a great reference (and supplement) to these courses, and also provide a more independent learning experience. These may be useful if you already have some knowledge of the subject or just need to fill in some gaps in your understanding:

O'Reilly Think Stats: An Introduction to Probability and Statistics for Python programmers

Introduction to Probability: Textbook for Berkeley’s Stats 134 class, an introductory treatment of probability with complementary exercises.

Berkeley Lecture Notes, Introduction to Probability: Compiled lecture notes of above textbook, complete with exercises.

OpenIntro: Statistics: Introductory text book with supplementary exercises and labs in an online portal.

Think Bayes: An simple introduction to Bayesian Statistics with Python code examples.

MACHINE LEARNING/ALGORITHMS
A solid base of Computer Science and algorithms is essential for an aspiring data scientist. Luckily there are a wealth of great resources online, and machine learning is one of the more lucrative (and advanced) skills of a data scientist.

Courses
Coursera Machine Learning: Stanford’s famous machine learning course taught by Andrew Ng.

Coursera: Computational Methods for Data Analysis: Statistical methods and data analysis applied to physical, engineering, and biological sciences.

MIT Data Mining: An introduction to the techniques of data mining and how to apply ML algorithms to garner insights.

Edx: Introduction to Artificial Intelligence: Introduction to Artificial Intelligence: The first half of Berkeley’s popular AI course that teaches you to build autonomous agents to efficiently make decisions in stochastic and adversarial settings.

Introduction to Computer Science and Programming: MIT’s introductory course to the theory and application of Computer Science.

Books
UCI: A First Encounter with Machine Learning: An introduction to machine learning concepts focusing on the intuition and explanation behind why they work.

A Programmer's Guide to Data Mining: A web based book complete with code samples (in Python) and exercises.

Data Structures and Algorithms with Object-Oriented Design Patterns in Python: An introduction to computer science with code examples in Python — covers algorithm analysis, data structures, sorting algorithms, and object oriented design.

An Introduction to Data Mining: An interactive Decision Tree guide (with hyperlinked lectures) to learning data mining and ML.

Elements of Statistical Learning: One of the most comprehensive treatments of data mining and ML, often used as a university textbook.

Stanford: An Introduction to Information Retrieval: Textbook from a Stanford course on NLP and information retrieval with sections on text classification, clustering, indexing, and web crawling.

DATA INGESTION AND CLEANING
One of the most under-appreciated aspects of data science is the cleaning and munging of data that often represents the most significant time sink during analysis. While there is never a silver bullet for such a problem, knowing the right tools, techniques, and approaches can help minimize time spent wrangling data.

Courses
School of Data: A Gentle Introduction to Cleaning Data: A hands on approach to learning to clean data, with plenty of exercises and web resources.

Tutorials
Predictive Analytics: Data Preparation: An introduction to the concepts and techniques of sampling data, accounting for erroneous values, and manipulating the data to transform it into acceptable formats.

Tools
OpenRefine (formerly Google Refine): A powerful tool for working with messy data, cleaning, transforming, extending it with web services, and linking to databases. Think Excel on steroids.

Data Wrangler: Stanford research project that provides an interactive tool for data cleaning and transformation.

sed - an Introduction and Tutorial: “The ultimate stream editor,” used to process files with regular expressions often used for substitution.

awk - An Introduction and Tutorial: “Another cornerstone of UNIX shell programming” — used for processing rows and columns of information.

VISUALIZATION
The most insightful data analysis is useless unless you can effectively communicate your results. The art of visualization has a long history, and while being one of the most qualitative aspects of data science its methods and tools are well documented.

Courses
UC Berkeley Visualization: Graduate class on the techniques and algorithms for creating effective visualizations.

Rice University Data Visualization: A treatment of data visualization and how to meaningfully present information from the perspective of Statistics.

Harvard University Introduction to Computing, Modeling, and Visualization: Connects the concepts of computing with data to the process of interactively visualizing results.

Books
Tufte: The Visual Display of Quantitative Information: Not freely available, but perhaps the most influential text for the subject of data visualization. A classic that defined the field.

Tutorials
School of Data: From Data to Diagrams: A gentle introduction to plotting and charting data, with exercises.

Predictive Analytics: Overview and Data Visualization: An introduction to the process of predictive modeling, and a treatment of the visualization of its results.

Tools
D3.js: Data-Driven Documents — Declarative manipulation of DOM elements with data dependent functions (with Python port).

Vega: A visualization grammar built on top of D3 for declarative visualizations in JSON. Released by the dream team at Trifacta, it provides a higher level abstraction than D3 for creating “ or SVG based graphics.

Rickshaw: A charting library built on top of D3 with a focus on interactive time series graphs.

Modest Maps: A lightweight library with a simple interface for working with maps in the browser (with ports to multiple languages).

Chart.js: Very simple (only six charts) HTML5 “ based plotting library with beautiful styling and animation.

COMPUTING AT SCALE
When you start operating with data at the scale of the web (or greater), the fundamental approach and process of analysis must change. To combat the ever increasing amount of data, Google developed the MapReduce paradigm. This programming model has become the de facto standard for large scale batch processing since the release of Apache Hadoop in 2007, the open-source MapReduce framework.

Courses
UC Berkeley: Analyzing Big Data with Twitter: A course — taught in close collaboration with Twitter — that focuses on the tools and algorithms for data analysis as applied to Twitter microblog data (with project based curriculum).

Coursera: Web Intelligence and Big Data: An introduction to dealing with large quantities of data from the web; how the tools and techniques for acquiring, manipulating, querying, and analyzing data change at scale.

CMU: Machine Learning with Large Datasets: A course on scaling machine learning algorithms on Hadoop to handle massive datasets.

U of Chicago: Large Scale Learning: A treatment of handling large datasets through dimensionality reduction, classification, feature parametrization, and efficient data structures.

UC Berkeley: Scalable Machine Learning: A broad introduction to the systems, algorithms, models, and optimizations necessary at scale.

Books
Mining Massive Datasets: Stanford course resources on large scale machine learning and MapReduce with accompanying book.

Data-Intensive Text Processing with MapReduce: An introduction to algorithms for the indexing and processing of text that teaches you to “think in MapReduce.”

Hadoop: The Definitive Guide: The most thorough treatment of the Hadoop framework, a great tutorial and reference alike.

Programming Pig: An introduction to the Pig framework for programming data flows on Hadoop.

PUTTING IT ALL TOGETHER
Data Science is an inherently multidisciplinary field that requires a myriad of skills to be a proficient practitioner. The necessary curriculum has not fit into traditional course offerings, but as awareness of the need for individuals who have such abilities is growing, we are seeing universities and private companies creating custom classes.

Courses
UC Berkeley: Introduction to Data Science: A course taught by Jeff Hammerbacher and Mike Franklin that highlights each of the varied skills that a Data Scientist must be proficient with.

How to Process, Analyze, and Visualize Data: A lab oriented course that teaches you the entire pipeline of data science; from acquiring datasets and analyzing them at scale to effectively visualizing the results.

Coursera: Introduction to Data Science: A tour of the basic techniques for Data Science including SQL and NoSQL databases, MapReduce on Hadoop, ML algorithms, and data visualization.

Columbia: Introduction to Data Science: A very comprehensive course that covers all aspects of data science, with an humanistic treatment of the field.

Columbia: Applied Data Science (with book): Another Columbia course — teaches applied software development fundamentals using real data, targeted towards people with mathematical backgrounds.

Coursera: Data Analysis (with notes and lectures): An applied statistics course that covers algorithms and techniques for analyzing data and interpreting the results to communicate your findings.

Books
An Introduction to Data Science: The companion textbook to Syracuse University’s flagship course for their new Data Science program.

Tutorials
Kaggle: Getting Started With Python For Data Science: A guided tour of setting up a development environment, an introduction to making your first competition submission, and validating your results.

CONCLUSION
Data science is infinitely complex field and this is just the beginning.

 If you want to get your hands dirty and gain experience working with these tools in a collaborative environment, check out our programs at http://zipfianacademy.com.

There's also a great SlideShare summarizing these skills: How to Become a Data Scientist

You're also invited to connect with us on Twitter @zipfianacademy and let us know if you want to learn more about any of these topics.

See question on Quora

Posted on 7 November 2013

What are the ways for parallelizing python and numpy codes?


NumPy by default provides some Python wrappers for underlying C libraries like BLAS and LAPACK (or ATLAS). If you want multithreading, I think you can build NumPy against different libraries (like MKL). This guy did some benchmarking: http://stackoverflow.com/a/76459...

Before you look at parallel implementations, you should look at some more optimization (incase you are not aware of it): PerformancePython - and PerformanceTips -

What you might want are Python wrappers to a high-performance linear algebra package written in C or Fortran. This exists for distributed memory systems in packages like PETSc, where they have petsc4py. If you need eigensolvers, there is also SLEPc and slepc4py (both rely on mpi4py). It takes a bit of messing around to set them up though.

In addition to PETSc, there is also Trilinos and its corresponding Python wrappers, PyTrilinos: PyTrilinos - Home. If you are looking at GPUs, there is PyCUDA: Andreas Klöckner's web page and Welcome to PyCUDA’s documentation!

Here is a really good presentation of high-performance Python: http://www.uni-graz.at/~haasegu/...

Unfortunately it is not as easy as it is in MATLAB, but fortunately if you put in some time, you will get massive speedups.

See question on Quora

Posted on 7 October 2013

I want to learn C or C++ programming language. I do not know anything about either, or programming, which should I learn first? Or are there other better alternatives like Java, or Python?


Gratuitous analogy time:
let's pretend you want to be a carpenter, and build a wooden house instead.

C is a hammer and some nails. You can build anything in principle, but as a beginner, it will take an awful lot of attempts to make anything bigger than a dog house.

C++ is a chainsaw and a nail-gun. Before making anything complicated, you must figure out how to make sure that you don't hurt yourself with the tools. While the tools are certainly powerful, my hat comes off to you if all your limbs are still intact when you can finally move into your new house.

Java is a corporate building contractor. You get to draw your own extravagantly detailed blueprints, but exactly how it translates into nails and boards is ultimately outside your control.

Python is a hired architect with a bunch of friends. You can tell it tersely what you want, and your house becomes a reality through the employment of a motley crew of freelancers who will have their own preferences of whether to use hammers or nail-guns. You can influence their work if you start asking questions, but you don't necessarily have to touch any of it.

C is good for programs that more-or-less directly access the operating system and hardware, C++ is good for larger applications that would be in C if it didn't take so long to write it all explicitly, Java is good for programs that should not be concerned with what type of computer they are running on, and Python is good for producing something that works very quickly, for further refinement if need be.

If I were teaching programming to beginners, I'd go with Python out of this lot, because although it does hide a bunch of things from the programmer, it also leaves the details as an option to dive in and figure out what it does exactly.

That's a good starting point for learning, I think.

See question on Quora

Posted on 13 August 2013

How do I become a data scientist?


There is a really comprehensive and cool visualization of the path to follow to become a data scientist.

The infographic shows the necessary skills to become a good data scientist and mapped out the learning path of a data scientist according to 10 different domains.

Edit: The image came from the article, Becoming a Data Scientist - Curriculum via Metromap - Pragmatic Perspectives, by Swami Chandrasekaran.



See question on Quora

Posted on 11 August 2013

How does Python compare to C#?


TL;DR


  • The answer is huge, but (hopefully) quite comprehensive. I programmed on C# / .NET for almost 10 years, so I know it really well. And I program on Python at Quora for ~ 7 months now, so I hope I know it pretty well.
  • Python is winner in: ease of learning, cross platform development, availability of open source libraries
  • C# is winner in: standard library, language features, development process and tools, performance, language evolution speed
  • Roughly even: syntax (Python is better in readability, C# has more consistent syntax), adoption.


Syntax


Python pros:
  • Usually shorter notation:
    • indentation, “:” and “\” instead of “;”, “{“ and “}”
    • no need to decorate any identifier with its type (or “var” in C#)
    • no need to specify function argument types and return type
    • no need to nest each function into a class
    • no need to use public / protected / private; you should follow “_name” or “__system_name__” convention, but it’s anyway shorter
    • no need to type “new” to invoke constructor
  • Generator / list comprehension expressions tend to be very short as well.

C# pros:
  • More consistent syntax and formatting rules. I can't remember any inconsistencies in C#, but instantly wrote this list of inconsistencies in Python:
    • the fact that assignment expression is not an expression at all
    • old and new class syntax (“class X:” vs “class X(object):”)
    • weird base method call syntax: super(ClassName, self).method_name - it violates DRY twice and doesn’t work for old style classes
    • different formatting rules for = in regular assignment expressions and function arg. spec.
    • "except SomeError, error" syntax: show this fragment to someone who never seen Python and ask him what it might mean
    • exception from common rule in tuple syntax: ( (), (1,), (1,2) )
    • class methods: IMO this feature just pollutes standard polymorphism by adding no any significant improvement; worse, people frequently misuse it (e.g. always prefer class methods to static methods)
    • the way Python deals with self / cls (I’ll discuss this in performance section) looks like a leaky abstraction; this adds noticeable performance impact as well
    • threading.local is one of the worst abstractions I’ve ever seen; C# offers few options here, but the most widely used is [ThreadStatic] attribute turning static variable into thread-local static. But in Python you must inherit your class from threading.local to ensure its instance fields are thread-local. So same instance sees different fields in different threads. The abstraction is bad not just because it looks like a huge hack, but also because it requires you to write more code (additional class) in typical scenario (thread-local static variable).
    • system method names with double underscores on both sides make you feel like you write on C :) (4 additional symbols for any of such names)
  • In few cases - shorter or cleaner syntax. E.g.:
    • lambda expressions and LINQ (vs generator expressions in Python) are shorter and more readable in C#
    • lower_case convention for method / function / local names (i.e. most of names) in Python requires more keystrokes than camelCase convention in C#. Basically, you do 1-2 additional keystrokes per each identifier.

Language features


Python features, that don't map well to C# features:
  • everything is dynamic. Sometimes this is really useful - e.g. you can add your own “tags” (fields or methods) to nearly any object, or use "monkey patching"  (What is monkey-patching?) to change the behavior of third-party code. It’s arguable if it’s good from design point, but there are cases when it’s really nice to have such an opportunity. You can achieve something similar in C#, btw - e.g. by employing dependency injection, extension methods + weak references, but this is more complex. On the other hand, it makes you think more about the architecture that enables you to make such changes without hacky tricks.
  • decorators: there are equivalents in C#, but Python’s decorators are definitely unbeatable in simplicity and flexibility. Closest C# equivalents: nearly any DI container capable of aspect injection (in particular, based on decorator-like attributes); PostSharp aspects.
  • *args, **kwargs - most of static languages don't provide such a way to enumerate/pass call arguments. On the other hand, there are lots of ways to process all arguments in C#:
  • yield expressions: yield can both accept and return value in Python, but in C# it can only accept a value. On the other hand, it looks like you can implement a very similar logic with async/await (await accepts and returns a value).
  • Class methods: actually it’s good C# doesn’t have this feature :)
  • “with” contexts in Python can process exceptions: __exit__ method there gets information about thrown exception. Unfortunately this is impossible in C#: IDisposable.Dispose() has no any arguments there. The feature can be really useful in some cases - e.g. when you need to make a commit/rollback decision inside block exit code. But you can handle this with nearly the same amount of delegate-based code in C#: dbContext.Transaction(() => { /* code */ }

Python features, that can be mapped to C# (i.e. there are close abstractions):
  • list comprehension syntax and generator expressions: LINQ to enumerable (LINQ in general is way more powerful)
  • generator methods: methods returning IEnumerable<T> and IEnumerator<T> in C#. Actually, this feature is implemented in more solid fashion in C#: result of IEnumerable<T> method can be enumerated as many times as you want, and result of IEnumerator<T> method can be enumerated just once. Now look how it works in Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
In [1]: def gen():
   ...:     yield 1
   ...:     yield 2

In [2]: g = gen()

In [3]: [x for x in g]
Out[3]: [1, 2]

In [4]: [x for x in g]
Out[4]: []

In [8]: g = gen()

In [9]: for x in g:
   ...:     print x
   ...:     if x == 1:
   ...:         break
   ...:
Out[9]: 1

In [10]: [x for x in g]
Out[10]: [2]


  • lambda expressions: the same in C#; you can get ~ AST of such an expression in C# too. As I’ve mentioned, C# notation for lambda expressions is shorter: “lambda x: x + 1” in Python is “x => x+1” in C#
  • dynamic typing: dynamic keyword in C# + DLR infrastructure. IronPython/IronRuby are built on it.
  • runtime code generation / parsing / evaluation: lots of facilities in C#: lambda expressions, System.CodeDom, System.Reflection.Emit, third-party wrappers helping to implement typical tasks, and finally, compiler-as-a-service in C# 5
  • named parameters and default values: works ~ the same in C# in terms of syntax.
  • tuple expressions (tuple syntax): C# doesn’t provide any syntax sugar for this, but there are tuples (i.e. regular classes).
  • regex syntax: the same.

Now let’s look on C# features, that are missing in Python. I’ll list just the most important ones from my own point of view, but you can get a good imagination of the whole list by looking up this feature-by-feature comparison of C# and Java.
  • Multithreading: Python, in fact, is single threaded: there are “threads”, but Global Interpreter Lock (GIL) allows just one Python thread to execute at any given moment of time (except the case when one of threads awaits for IO completion). There are Python implementations w/o GIL - IronPython (.NET-based) and Jython (JVM-based), but since library developers assume there is GIL, you actually can’t rely on GIL absence even on these platforms. Absence of multithreading brings really big performance-related disadvantages for web apps (pretty unexpected, btw, if you didn’t face them) - but I’ll cover this in performance section.
  • Generics and structs: on one hand, Python doesn’t need generics, since it’s dynamic. On the other hand, they frequently provide huge benefits in terms of memory and speed in C# - even in comparison to Java with its type erasure based generics. E.g. if Vector3D is a struct of 3 doubles in C#, var list = List<Vector3D>(...) will really use ~ list.Length * 3 * sizeof(double) amount of RAM for it, and will be represented by just 2 objects in heap (List<T> + Vector3D[] - the underlying array it uses for storage). But Java should allocate ~ list.Length of objects in heap for this list: each Vector3D is represented by an object in heap there, so totally it uses at least 3 * list.Length * pointer_size more RAM for this. And Python should allocate ~ 4 * list.Length objects in heap (both Vector3D and double are in heap there).
  • LINQ: it isn’t the same as list comprehension in Python:
    • List comprehension is specific syntax sugar available just for iterables in Python
    • LINQ is syntax sugar as well, but it isn’t bound to a particular generic type: it works for any generic type meeting few expectations. In particular:
      • If this type is IEnumerable<T>, it’s “LINQ to enumerable” - a set of extension methods operating as list comprehension in other languages
      • But if this type is IQueryable<T>, it’s what normally referenced as LINQ - a technology allowing to transform a query expression on C# to query on virtually any other language. In particular, it’s LINQ to SQL, LINQ in Entity Framework and so on.
      • Finally, underlying type can be your own. Few examples: LINQ to observables (Reactive Extensions) and parsing monad.
  • Interfaces. They're really good, if the same contract is implemented by several classes.
  • Extension methods: pretty nice feature, actually, although it’s just a syntax sugar improving code readability. Anyway, I used them in C# (mainly to add helpers to some built-in and third-party types like IEnumerable<T>, IRepository<T> and some enums), and their absence in other languages makes me a bit disappointed. You can use "monkey patching" in Python, that's something very similar, but not the same: extension methods don't change the original class and are applied if you import a namespace declaring some of them. So different code can import different extension methods (even if they have identical names).
  • Enumeration types. They’re implemented super-well in C#, and it’s quite good to have this feature integrated.
  • Method overloading: no analogue, but you can handle different argument sets differently in the same method in Python (although usually this is more complex). Method overloading is something natural for static languages, but quite unnatural for dynamic ones.
  • More native code, less interop / plumbing code. C# is fast, so normally you deal just with C# while writing on it. The same isn’t valid for Python: it’s a bad idea to implement nearly any algorithm requiring huge amount of RAM or CPU resources in Python; you’ll quickly learn that nearly any code requiring high performance must be written on C++. All this implies that:
    • If you develop a realtime web app on Python, I’m 99% sure you’ll end up writing few modules on some other (faster) language - e.g. C++. Maybe it’s just 1-2% of the codebase, but it’s +1 language to learn and +1 dedicated subsystem to maintain. Besides C++, you need to know how to interact with C++ code from Python, i.e. how to write some plumbing Python code enabling the interop.
    • Compare this w/interop in C#: I suspect 99% of web apps written on C# don’t  have any C++ code at all, because it’s ~ as fast as C++; there are few pretty rare cases when you need C++, but they’re really rare. And even if you’ll ever need this, C# provides really awesome interop capabilities.
  • async/await, Task Parallel Library and Parallel LINQ: they’re really useful in C#, and honestly, most of other languages don’t offer anything similar in terms of usability. But since Python doesn’t support concurrency, this is even worse there. Probably, parallel processing in Cython is one of the simplest options available, but even it looks more as workaround having really limited application scope rather than a full solution, and AFAIK, we never used it practically.

Standard library


C# definitely wins here: its BCL is:
  • better designed (more classes, less static functions)
  • strictly follows naming guidelines (there are lots of names violating PEP8 in its current class library - they’re there mainly for compatibility)
  • better documented
  • and finally, it offers more abstractions.

I can't say Python BCL is totally bad, but it really look worse in comparison to .NET.

Development process and tools


I used:
  • C#: Visual Studio .NET (2005, 2008, 2010 and 2012) w/ReSharper
  • Python: latest PyCharm, Vim and some Unix tools like grep

Python pros:
  • Interactive Python. You can test lots of small things instantly.
  • Fast startup / restart. Even such a large application as Quora starts in ~ 5 seconds on my devbox. That’s actually a very good property: it doesn’t add a significant friction if you used to do small iterations.
  • No need to compile the project. More precisely, you _almost never_ need to compile the project: I estimate nearly any complex project has some non-Python code + some parts requiring Cython (Python-to-C compiler).
  • No need to maintain project files. I mean *.csproj and *.sln for C# - usually they’re managed by Visual Studio .NET, but almost any complex project requires some of them to be modified manually.

C# pros: most of C# pros are based on fact that it is statically typed, so development tools can rely on this. I’ll list just the most important 
  • Correct intellisense suggestions. I’d estimate that PyCharm provides a correct suggestion only in 30% of cases.
  • Correct refactorings. Nearly any refactoring in PyCharm requires manual fixes. I’d say PyCharm dependency detection gives some false positives in 30% cases, and false negatives in 10% of cases, so you always need to supervise this. And that’s really annoying in case of really large refactorings (100+ usages). Most of Python developers use grep for this, and that’s way more complex. I’d say, that’s probably the biggest disadvantage: you can easily spend 10+ minutes (in some cases - an hour or so) on actions that take 1 minute in ReSharper.
  • Way better “online” error detection / highlighting. The most typical example: if PyCharm is incapable to derive the type of x, don’t expect it highlights x.soemthing as an error. The same never happens in case with C# (of course, if you don’t use dynamics).
  • ReSharper provides way more helpers - mostly, stub generators. E.g. "implement /extract interface" or "implement equality members" is something I used to. PyCharm has nothing similar (probably, because Python doesn't support interfaces :) ). But anyway, the point is: despite the fact C# requires you to write more bolerplate code, nearly all typical code can be quickly generated by tools like ReSharper. So you actually don't feel any friction about that.
  • VS.NET provides better support for related tools and technologies. Full support of e.g. Razor ASP.NET MVC templates (intellisense, navigation, errors, etc.), ASP.NET MVC itself and languages like Less/Sass is really useful: you do way less actions to add somethign standard.
  • Compilation detects may be 50% of errors even before launching the code. Yeah, that’s one of benefits of statically typed language: you don’t need to run the code to detect pretty large part of errors.
  • Better debugger, integrated profiler. I can’t say PyCharm is bad here (no profiler, but its debugger is very nice), but I have a feeling that VS.NET offers more useful options.
  • Relatively fast compilation and startup. “Relatively”, because it is way faster than e.g. GCC; startup is fast if you pay attention to this - actually it’s pretty easy to make it slow, especially for a large app.

Overall, I think C# is definitely a winner in scope of this section. Python clearly gives some benefits if your codebase is relatively small, but most of them turn into disadvantages when it becomes large (or huge). E.g. pretty big amount of friction associated with refactoring makes developers to postpone this, that eventually increases technical debt. I can’t say if snowball effect is highly probable in this case, but at least its probability seems higher.

Performance


Disclaimer: I never studied CPython code, so a part of further statements explaining how it works internally might be false: I reason about this mainly by applying my Python debugging experience + general logic. Nevertheless estimated timings provided here are well-aligned with actual measurements, so hopefully, if there are some mistakes, they aren’t vital.

That’s the most painful part of comparison for Python. CPython (most widely used Python interpreter) has a set of issues related to performance, and I’ll try to cover most important of them here; many of such issues are related to other dynamic languages, but definitely not all of them.

It also worth to mention there is PyPy claiming to be almost 6x faster than CPython; on the other hand, so far we couldn’t reach any speedup by running Quora on PyPy. I can’t fully explain why, but I feel this is mostly because:
  • We use Cython to improve performance at all major hotspots; it’s tiny % of codebase, but it looks like we’d have 30-50% worse performance w/o Cython. So CPython our codebase actually isn’t a pure Python code
  • It seems PyPy isn’t quite efficient for applications having huge codebase and large working set
  • We didn’t try to implement any PyPy-specific optimizations in PyPy branch.

Currently performance of PyPy branch is ~ the same or slower as performance of primary CPython branch; I’ll update this section if there are any changes.

So why CPython is slow? Let’s start from Alex Gaynor’s presentation:

It worth to look all the slides, but I’m going to describe what actually happens in CPython on this particular example. Fast-forward to slide 24: https://speakerdeck.com/alex/why...

Alex lists 3 allocations done by this code, but reality seems to be way worse - it’s actually pretty tricky to even list all the stuff involved there:

Python: int(s.split("-", 1)[0])

a) s.split(...) involves:
  • a dictionary lookup for "s" name. Few dictionary lookups, if this happens inside a closure (or few nested ones).
  • a dictionary lookup for “split” method of “str” type
  • creation of a bound method object even before call, i.e. a allocation of a new object in heap:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
In [3]: x = "a".split
In [4]: y = "b".split
In [5]: x is y  # “is” performs reference comparison in Python
Out[5]: False
In [6]: z = str.split
In [7]: z is x
Out[7]: False
In [8]: x1 = "a".split
In [9]: x1 is x
Out[9]: False


This happens because methods are actually descriptor objects in Python. So when is executes some_object.some_method(some_value), actually few dictionary lookups + few allocations might happen:
  • CPython looks for “some_method” in “some_object”’s dictionary (actually, its class object, but let’s assume we still have constant lookup time assuming inheritance hierarchy is tiny)
  • And tries to get __get__ member of this object - to check if it’s a descriptor or not, so probably it’s +1 lookup (actually I hope system methods invoked by CPython itself are invoked ~ like virtual methods in C++, i.e. it’s fast). Since some_method is instance method, there is __get__ member
  • CPython invokes this __get__ member. To invoke it, it must construct and pass a tuple with its arguments. Quite likely, +1 allocation - it’s pretty unlikely they’re passed as structs on stack.
  • __get__ should return a bound method instance, i.e. it’s +1 allocation in the best case as well.
  • Likely, some_method call implies allocation of a new tuple object containing all passed arguments - i.e. (some_value,) in our case
  • Since we invoke a bound method, there can be two tuples: normally bound method adds self to a tuple passed to it, i.e. produces another tuple: (self, some_value)
  • Probably this isn’t intact for methods that don't accept **kwargs, but the ones that do require a new dictionary to be constructed (pretty large object).

So:
  • instance method access =~ 1 dictionary lookup + 2 allocations/deallocations + 1 VMT-like lookup (__get__ member search)
  • instance method call =~ at least 1 dictionary lookup + 4 allocations/deallocations + 1-2 VMT-like lookups

b) split(...) result is a list, which is usually composed of two objects: list wrapper + array, that's re-allocated when list reaches its current size limit. So if it returns a list of 2 strings, there must be at least 4 new objects in heap (list, wrapper + 2 strings).

c) ...[0]: works nearly any other method call. Fast languages allow to turn off bounds check for such ops, but effect of doing this in Python must be negligible in comparison to other expenses.

d) int(...):
  • a dictionary lookup for "int" name. Few dictionary lookups, if this happens inside a closure (or few nested ones).
  • likely, one tuple allocation (for method call arguments)

So let's count the minimum number of dictionary lookups and allocations/deallocations for this simple code:
  • Dictionary lookups: 1 + 1 + 1 = 3
  • Allocations/deallocations: 4 + 4 + 4 + 1 = 13

Quick measurement shows that Python needs at least ~50ns for dictionary lookup, and ~50ns for allocation. It’s more difficult to measure deallocation time, but it’s safe to assume it is comparable to allocation time. So expected performance limit of this piece of code is:
  • 50ns * 3 + (50ns + 50ns) * 13 = 1450ns

Checking with %timeit:

1
2
3
4
5
In [1]: s = '1-'
In [2]: int(s.split("-", 1)[0])
Out[14]: 1
In [3]: %timeit int(s.split("-", 1)[0])
1000000 loops, best of 3: 1.80 us per loop


So actual time is 1800ns.

Now let’s calculate what time the same operation could take in C#:

C#: int.Parse(s.Split('-'))[0]:
  • 3 static class, 0 virtual calls. Static call require ~ 1ns in C#
  • 1 string[] allocation, if string.Split is optimized pretty well; small object allocation require ~ 10ns in C#
  • 2 string allocation, if Split produces 2 strings
  • No need to count deallocation time, since it’s nearly zero for any short-living objects in languages with true generational GC w/compactions.

So this code should require at least 33ns in C# (I'll add actual measurement result later here). Actually it should take a bit more, since Parse and Split do some job, and it should be probably comparable to these 33ns in terms of CPU consumption time. But even this simple calculation shows this code must be ~ 44x faster on C#; likely the same is valid for other similar languages (Java, Scala, etc.).

So let’s summarize some of CPython issues related to performance, that were exposed here:

1. Lots of allocations / deallocations, that are costly

Thanks to:
  • Bound methods, and likely, a convention of passing arguments as a tuple
  • The fact that nearly any variable (even local) is stored in heap
  • Absence of generics and structs. On contrary, these two things together make C# code almost as efficient as C++ in terms of RAM consumption.
  • Finally, CPython uses ~ regular memory allocator. This mean that it should find a large enough space in heap to allocate the object and track the fact this space is used, and do the opposite on deallocation. This happens in any case, i.e. for small, short-living objects as well.

Now imagine, .NET needs just to move the pointer + clean up allocated block for any typical allocation, and does virtually nothing for typical deallocation (really: Garbage collection (computer science)). So .NET heap acts much more like a stack: there is no need to look up for free space to allocate some RAM, and no need to worry about deallocation, since there is GC, that touches just live objects, and knows absolutely nothing about dead ones. And since most of objects are short-living, .NET spends zero resources on deallocation of most of objects. Certainly there is amortized deallocation cost, but it's way smaller than in Python (may be it's smaller even than in C++, btw) due to this fact.

This explains, btw, why Java is nearly as fast as C#, although there are no structs and true generics: this stack-like heap behavior makes allocations/deallocations of small objects really cheap - almost as cheap as placing them on regular stack (structs and other value types in .NET live on stack). So basically, Java may be slower due to this only when lots of such objects are long-living, so they're promoted to higher GC generations, and this adds some pressure on full GC cycles. This affects on CPU cache hit ratio as well, since any object in heap has additional header (16 bytes in 64-bit processes on .NET; not sure about Java), so if objects are small, % of excessive data in continuous block of RAM can be pretty high - e.g. at least 80% for such blocks of integers. But anyway, that's Java; Python has way more serious issues here.

2. Relatively large objects

Most of objects are dictionaries, so they tend to consume several times more RAM than e.g. in objects C#. Efficiently this reduces the size of CPU caches by the same factor. But L1/L2 misses are super-expensive - they can easily slow down your code by an order of magnitude, and Python programs tend to get these 10x way easier.

3. Bad garbage collection (GC) implementation

CPython doesn’t have true generational GC. That's mentioned in 1), but here I'd like to show few more side effects of that:
  • It never moves objects in memory (Do python objects move in memory during execution?), thus there are no "true" generations defined by border addresses in RAM (I suspect generation is just some mark in object's header there)
  • It uses reference counting + likely, mark and sweep-like collector to find and remove garbage
  • There are generations, but it looks like they aren’t quite helpful: GC pause can easily take few seconds, if you have a large heap

Why it’s bad?
  • No compactions = RAM fragmentation + bad CPU cache utilization. For comparison, .NET tunes up GC in such a way that Gen0 is always inside L1 CPU cache, and Gen1 is inside L2 CPU cache - i.e. most frequently accessed objects are cached quite efficiently in C#, and almost always aren’t in cache in Python - due to RAM fragmentation and significantly larger average object size. Details: Garbage collection (computer science)
  • Reference counting = global interpreter lock = no true multithreading + no benefits of fork()-ing the process to share a single copy of code and initial data set across several Python processes
  • GC pauses are proportional to the amount of RAM used by your application. Thus CPython app using multiple GB of RAM and running without pauses of several seconds (if not minutes!) is something impossible. Actually even an app using about hundred of megabytes requires special GC tweaks  and manual GC cycles to avoid pauses at random moments.

Finally, I found it quite misleading that "gc" module documentation doesn't reflect this: 27.12. gc - Garbage Collector interface - Python v2.7.5 documentation. Worse, if you look on descriptions there, I bet the initial impression you'll get is: "cool, it has generational GC!". There are parts that might make an experienced developer to suspect the opposite (can you find them, btw?), but I feel like the only conclusion most of people can make is that Python has nearly the same implementation of GC as other modern languages. And that's misleading.

4. Lots of dictionary lookups

  • Python uses dictionary lookup to resolve address of nearly any symbol (even local variable). Dictionary lookup time is ~ 50ns there (btw, it’s nearly the same in C# for Dictionary<K,V>).
  • Static languages (including C#) resolve virtual method addresses using Virtual Method Table lookup, and such lookup usually takes 2-3ns to complete.

.NET code uses dictionary lookups just in one case: when it invokes virtual generic method parameterized by generic type(s) (i.e. the call looks like someObject.SomeMethod<T1,T2,...>(...)) - dictionary lookup is the only option it has in this case. But this is relatively rare thing, actually - especially for hotspot code. And even this dictionary lookup usually takes ~ 20ms there.

5. Multithreading and related issues

As I wrote before, Python, in fact, is single threaded. And just one this things makes it impossible to implement e.g. such optimizations available for web applications on .NET:
  • Any in-memory caches: you can share them across all the threads serving web requests on .NET, but you can’t do the same in Python, since there is no point to handle more than one web request in a single Python process (if it doesn’t wait a lot of time in IO, of course). So you can assume each web request handler written in Python is actually a process with its own address space, and thus you can’t share caches across such processes. So if you run e.g. 16 Python processes on web tier machine, you can assume that effective cache size is ~ 1/16 of cache size for the same .NET app. You can try to tackle this issue by running a process maintaining shared cache for multiple Python processes, but you’ll get additional interop expenses (serialization and deserialization), that add a huge overhead in comparison to direct memory access.
  • The same is actually related to Python code itself. Imagine you run a large web application, that allocates 100MB right on start for code and data. So if it’s .NET application, it can utilize threads to service concurrent requests, there is just one instance of these 100MB of shared stuff. If 5-10MB of it is the part that’s used most frequently (working set), you have a good case, since it’s close to L2 CPU cache size. Now imagine the same in Python: 16 similar processes allocate 1.6GB of right on start; working set size is 80-160MB, which is far beyond L2 cache size, so just one this thing slows down everything by may be a factor of 10. Probably you think you can fork a single Python process to ensure there is a single copy of these 100MB, but this won’t help much: CPython mixed garbage collection utilizing reference counting, so basically, if you reference / dereference something (e.g. copy a reference to some object into local variable), this counter changes its value. So any memory page containing any object from this hot set has nearly zero chance of staying unmodified - i.e. nearly all pages after forking will be copied pretty fast by each of such processes.
  • Above issues are way more painful for Python web apps than e.g. absence of TPL, PLINQ and async/await analogues: you get parallelism almost for free here, so additional parallelism is actually rarely necessary. But this “free parallelism” in .NET is actually way better then “free parallelism” in Python.

Cross-platform development support


Python clearly wins here: C# works very well on Unix under Mono, but it’s mostly about its base class library. Nearly anything tightly bound with Windows isn’t available there. Incompatibility map: Compatibility - Mono

Adoption


Roughly even: PYPL PopularitY of Programming Language index - pyDatalog

Availability of open source libraries


Based on my experience, Python and C# roughly even in terms of availability of free / open source third-party libraries. Nearly all you need to develop a web application is free on .NET.

But Python has definitely more open source projects on GitHub: Top Languages · GitHub (many C# projects are hosted on http://www.codeplex.com/ - originally it was the best place to host your own C# project, but right now it’s GitHub, so I think it’s fair to ignore this).

So Python is winner here.

Ease of learning


Python seems way easier to learn:
  • Basic syntax requires you to know less language constructions - e.g. no program in C# can be written w/o declaring a class; you need to know what’s compilation, assemblies, namespaces, classes, methods, public/private/static keywords, etc. On the other hand, you can write a program even without declaring a function in Python. So it’s easier to learn Python iteratively: you need to know almost nothing at start, and use more and more features while studying it deeper. In contrast, C# requires you to learn way more before you even start to write your first program on it.
  • Interactive Python provides really nice way to learn the language and run quick tests.
  • Python standard library is mostly built over functions - there are just few classes, no any complex inheritance, etc.; in contrast, C# base class library is fully object-oriented: lots of classes, sometimes - deep inheritance. Moreover, some parts of it require you to understand functional concepts very well - e.g. as I wrote, LINQ in C# is way more powerful than list comprehension syntax in Python, but that comes with associated learning expenses: list comprehension syntax in Python is, in fact, just a nicer way to write loops, and thus it’s super-easy to explain this. And LINQ is, in fact, a syntax sugar for defining Monads in C# + implementations of some of them (IEnumerable<T>, IQueryable<T>). This description highlights the difference very well :) And LINQ isn’t the only example: there are few other parts of C# BCL that require you to deeply understand all the concepts; try learning WPF, for example.
  • You need to know a set of specialized tools to write on C#. E.g. I used Visual Studio .NET with Resharper and a set of other plugins, Far (it’s like Midnight Commander, but for Windows), Redgate .NET Reflector and IIS on daily basis. And most of people writing on Python use just Vim/Emacs + a set of standard Unix tools. Not sure what’s better here in terms of learning curve - basics of VS.NET are pretty easy to learn; on the other hand, a typical Unix developer doesn’t need to learn any new tools at all to develop on Python, assuming he already knows Vim and unix tools like grep. But... Typical Windows developer knows VS.NET as well :) Anyway, the point is: there are simpler (i.e. less advanced than for C#), but more generic development tools for Python.

Language and runtime evolution speed


I feel like C# evolves way faster:


See question on Quora

Posted on 16 May 2013

Is Python the most important programming language to learn for aspiring data scientists & data miners?


For aspiring Data Scientists, Python is probably the most important language to learn because of its rich ecosystem.

Python's major advantage is its breadth. For example, R can run Machine Learning algorithms on a preprocessed dataset, but Python is much better at processing the data. Pandas is an incredibly useful library that can essentially do everything SQL does and more. matplotlib lets you create useful visualizations to quickly understand your data.

In terms of algorithm availability, you can get plenty of algorithms out of the box with scikit-learn. And if you want to customize every detail of your models, Python has Theano. In addition, Theano is easily configured to run on the GPU, which gives you a cheap and easy way to get much higher speeds without having to change a single line of code or delve into performance details.

I've used R, matlab, Octave, Python, SAS, and even Microsoft Analysis Services, and Python is the clear winner in my book.

See question on Quora

Posted on 26 April 2013

Which is better, PHP or Python? Why?


Python will make you a better programmer over time, because the language is consistent, borrows goods ideas from functional programming, is clean, easy to read, has a lots of clever and useful constructs (decorators, iterators, list comprehensions, ...), has first-class functions, comes fully loaded with any library you've ever dreamed of, has a great community, clear and respected conventions and philosophy (look at PEP8), etc, etc.

Try out Flask to start getting results right away: it's easy enough to get a website up an running in a matter of minutes. Development environment is very easy to setup, but prod env may be slightly harder to get right, according to your sysop skills. Basic or free hosting is on the php side though.

Jython, a python interpreter based on the jvm is available and may be something to look at: I believe it's commonly use to offer scripting capabilities to java  applications.

Learning curve isn't as steep as it could seems: indentation matters, no curly braces. Beyond that, you will feel right at home but with more power and expressiveness than the half-backed php.

Just as a reminder : php, born Personal Home Page, was a set of simple macros written to basically get a dynamic visitor counter and grew from this legacy. The old days are way behind, but you can still feel the language organic growth  everywhere (inconsistencies, half-backed object model, half-backed closures (it's really just anonymous functions landed a few months (years?) ago), amateurish community, etc).
If you still want to or have to work with php, have a look at Fabien Potencier's work: he's the lead dev of successful frameworks and libs such as symphony, twig, composer, and some other top-notch stuff.

Anyway, just go with python :)

See question on Quora

Posted on 10 April 2013

Which is better, PHP or Python? Why?


My vote is for Python.  Here are just a few reasons based on my experience with both...

- Arrays: Most languages make a distinction between arrays and hashes.  PHP uses the same data type for both, which forces a programmer to jump through a lot of hoops testing the sanity of their array structures.  Have a look at "array" in the PHP documentation.  There are way too many array functions.

- Objects: In Python everything is an object, so string methods are accessible through string objects, array (or list) methods are accessible through the list object, etc., e.g. Python: my_list.pop() VS. PHP: array_pop($i_hop_this_is_indeed_an_array)

- Errors/Typos: PHP is hard to debug because it lets nearly anything fly with only notices or warnings.  For example, you can spend countless hours adding print statements throughout your PHP code only to discover a simple typo in a variable name.  Python will stacktrace if you try to access an uninitialized variable, or accidentally add an integer and a string.

- Triple equal:  PHP WTF?  Need I say more?  :-)

- Conciseness: I personally feel like I can accomplish more with less code in Python, and it's easier to read.  It's not just the fact that Python doesn't require all the dollar signs, curly braces and other syntactic cruft, Python is simply more concise.

- Imports: IMHO, PHP's method of including code is confusing (include(), include_once(), require(), require_once()) and the new namespacing stuff is even worse.  Python uses import statements like Java... simple and clean.

This list could go on forever.  ;)

See question on Quora

Posted on 9 April 2013

Which is better, PHP or Python? Why?


If you don't know either language and need to crank out a quick simple one-off web site where maintainability is much less important than time to first working version: PHP.

If raw performance of the runtime environment is a critical factor: Maybe PHP, thanks to the existence of Facebook's HipHop for PHP which as far as I know isn't matched in raw performance by anything on the Python side. (Someone please tell me if I'm wrong.) Of course, few web apps are going to be CPU-bound.

If neither of those two criteria apply: Python.

Hey, someone needed to take a stab at a contrary answer.

See question on Quora

Posted on 8 April 2013

Which is better, PHP or Python? Why?


As a web developer, I have to say that Python (and libraries like Django) are amazing.  The main problem is one of resources, and why I always develop in PHP:

PHP is better than Python (as a realistic rapid development language) because:

  1. It is MUCH easier to find PHP developers (There are a lot more around).
  2. Thus easier to get someone to jump in on an existing project if it was done in PHP.
  3. PHP developers are CHEAPER to employ.
  4. Just about any hosting environment supports PHP in a standard configuration.  So when it comes down to deploying copies of the same package to many different servers, PHP becomes a much more logical solution.
  5. Django/Python is a pain in the ass to install.  It takes a lot longer to set up and configure and a lot fewer hosting companies support it.


See question on Quora

Posted on 5 April 2013

Is it time for us to dump the OOP paradigm? If yes, what can replace it?


There are two key goals in designing a programming language: decreasing coupling while increasing cohesion. That is, you're trying to split the problem "at the joints", where the parts that are naturally connected to each other stay connected, while the parts that can be swapped out are connected only at well-defined interfaces.

The fundamental unit of programming is the function, e.g.:


foo(a,b,c) -> d


A function wraps up some chunk of functionality and assigns it a name. Hopefully, you don't have to peer inside of it: you can deduce as much as possible about it from the name and its signature: "It takes
a
s,
b
s, and
c
s, and
foo
s them to make a d". 

What you find with many functions is that the first argument is special. You end up reading it as "You take an
a
, and you
foo
it with a
b
and a
c
, to make a
d
".

This is the fundamental concept behind object-oriented programming: you declare that the first argument is special and say, "I'm going to split the world at that joint. I'll even make a special syntax for it":

a.foo(b,c) -> d


If you've split the world correctly, you can find other functions that also have more "
a
"-ness to them, and put them together. If your language is stateful, you can let these functions unpack that state for each other, which forces them to be tightly coupled but at least controls the scope of that coupling.

Note that what we're talking about here is really psychology rather than computer science. The joint-splitting happens because it's how people naturally perceive the world. The human mind is fundamentally object-oriented, though people don't always realize that their objects and other people's objects aren't always the same ones. There's nearly always more than one way to look at the world, both the natural one and the artificial one that computers created for themselves (windows, file systems, etc.)

And, as the question observes, sometimes you really don't find a natural joint-splitting at all. You end up perceiving it in terms of functions. However, the larger the problem is, the more necessary it will be to find some way to split it. Even if the joints don't correspond to natural divisions of the world, they may correspond to the natural divisions of progammers working on the same project. One may have expertise in back ends, others in computation, others in display. You will still need to follow the properties of coherence and coupling, and object-oriented languages provide you a way to do that based on the natural human metaphor of putting things into boxes and hiding the details from the outside.

There is also mental baggage that comes from that, which is why small, one-shot programs tend to find the object-oriented paradigm cumbersome. Object-oriented paradigms tend to assume (without even realizing it) a natural flow-of-control, which can interfere when you try to adapt it to some other control mechanism (which is becoming increasingly common, as frameworks demand that you adhere to their flow of control rather than a library which follows your flow of control.)

I don't think OOP is going away any time soon. A more prominent trend, I believe, is going to be for transactionalism to begin to take center-stage. I'd like to see the distinction between short-term and long-term memory smudged (eliminating the embarrassing impedance mismatch problem of talking to long-term storage in a different language from your in-memory processing).

See question on Quora

Posted on 15 March 2013

Why is there a recent trend away from PHP towards Python and Ruby on Rails?


PHP was the first open source language designed for the web and reached maturity around 1999 with the release of PHP4.  Before PHP there was only perl (free, but a general purpose scripting language, kind of hard to learn) and ASP (which was not free and required an enterprise-level budget to run.)  So PHP had a head start of about 6 years over Ruby andy Python.  (These languages existed since the mid 1990's but had no web frameworks written for them).

Despite PHP's many shortcomings (lack of true object orientation, weak exception handling, no lambdas, and as others have mentioned, being essentially a huge flat namespace of inconsistently-named functions) it won by its ubiquity.  It was free and even the cheapest commodity web hosting providers were offering PHP by 2002 or 2003, so it had a full generation in Internet years to establish itself as the common language for open source developers. 

The emergence of Rails in 2005 began to change that but it took a few years for Rails to gain mainstream acceptance.  Python followed suit with the development of the Django framework, on the same MVC pattern as Rails. 

Services like Heroku were essential in getting Ruby to the mainstream - you no longer needed to have dedicated servers or know how to compile source code to run a Ruby server - you essentially had the same consumer-level pricing for running Ruby apps that you had with PHP. 

Ruby and Python are overtaking PHP because developers tend to favor the languages - they have better abstractions and allow programmers to be more productive.  Also, the ubiquity of PHP worked against it a little because it meant that less skilled programmers could contribute code and the quality of code in PHP projects is generally of a much lower quality as a result (see WordPress plugins for example) while the Ruby and Python communities have focused on developing better coding practices like Test Driven Development.  As a result, people who use Ruby and Python are perceived as "better" programmers, and more desirable hires.  New technology-focused companies are thus more likely to start projects in Ruby and Python because of the perceived higher quality of developers, even though for most web applications, an experienced team ("experienced" being the key) using Symfony or Cake can be just as productive as a team using Rails or Django

There's always going to be a fringe language X that's favored by hackers and academics, but has no obvious business application and thus stays obscure, only to seemingly come out of nowhere years after its invention when the critical mixture of a user need and practical libraries is achieved.  Today it might be Haskell or OCAML or Scala.  It's been LISP for about 50 years now.

See question on Quora

Posted on 20 February 2013

Is it time for us to dump the OOP paradigm? If yes, what can replace it?


The problem with this question is that "object-oriented programming" is not well-defined.

What is an object? To me, it means "thing of which I have incomplete knowledge". Sometimes that's good. For example, a "file" could be local, on the network, a device, or stdout. The programmer shouldn't care, because the interfaces can be the same. Interfaces enable code reuse, prevent a variety of consistency errors (by disabling access to internals that might allow inconsistencies) and also keep the programmer sane by shielding her from irrelevant details.

Interfaces are a major object-oriented win. This was the Alan Kay vision: encapsulate complexity behind interfaces, and replace the global space with locally interpreted functions (message passing). There's good and bad in it, but there are a lot of important ideas here.

Inheritance is mostly bad. I say "mostly" because it makes sense in building GUIs, but in little else.

Interfaces make programs easier to manage by limiting what the client needs to know about an object. Unfortunately, inheritance undoes this work by allowing the creation of objects with distributed semantics that are impossible to reason about.

The problem with object-oriented programming is that 99% of object-oriented code is complete garbage: illegible spaghetti written by mediocre programmers on Big Software projects. The discipline has devolved from Alan Kay's vision-- when complexity becomes inevitable, do this-- to a bizarre perversion of programming-- focus on writing giant, immensely complex objects rather than solving problems-- that managers love, because over-ambitious projects and giant teams are good for their careers, but that makes awful software.

The real enemy is Commodity Programmer culture and the corporate world in which programmers never improve (unless they're extremely insubordinate or energetic and find the time, because corporate work will make them worse programmers). What passes for "object-oriented programming" is hideous, but that's not to say that all of the ideas in it are bad. Sometimes, it's the right way to program.

See question on Quora

Posted on 12 January 2013

How do I learn Python?


The easiest way to learn a programming language is to first learn the basics and then try to build something with it (learn by doing). And it's better if you are building something you are actually interested in rather than something out of a book because it will get you to think about the problem and be more meaningful.

Python is easy to learn (not much syntax), easy to read (explicit vs implicit), has a big ecosystem (more packages/libraries), is taught at universities so it's easy to find good programmers to help, and is used by many large websites/companies (e.g., Quora is programmed in Python) so it's a good language to know.

Online Python Tutorials (in order from introductory to more advanced):

  1. "A Byte of Python" http://www.swaroopch.com/notes/P...
  2. Google's Into to Python Class (online) - http://code.google.com/edu/langu...
  3. "Dive Into Python", by Mark Pilgrim http://diveintopython.org/toc/in...
  4. "The New Boston" Programming Python Tutorials - http://www.youtube.com/user/then...
  5. "Building Skills in Python", by Steven F. Lott - http://homepage.mac.com/s_lott/b...
  6. "Think Python: How to Think Like a Computer Scientist" - http://www.greenteapress.com/thi...
  7. "Code Like a Pythonista: Idiomatic Python"  -http://python.net/~goodger/proje...
  8. OpenCourseWare: MIT 6.00 Introduction to Computer Science and Programming - http://ocw.mit.edu/courses/elect....
  9. MIT 6.01 Course Readings (PDF) - http://mit.edu/6.01/mercurial/sp...
  10. Google's "Understanding Python" (more advanced talk) - http://www.youtube.com/watch?v=H...
  11. "A Guide to Python's Magic Methods" - http://www.rafekettler.com/magic...
  12. "Metaclasses Demystified" -http://cleverdevil.org/computing...

Book to Get: "Python Cookbook", by Alex Martelli (http://www.amazon.com/Python-Coo...)

And if you're building something Web based, look at using the Flask Web Framework (http://flask.pocoo.org/docs/).

Flask is a modern, lightweight, and well-documented Python Web framework so you won't have to spend much time learning it or fighting with it -- you won't find yourself asking, "Will I be able to do what I want in the framework without hacking it?" Flask let's you program in Python rather than writing to the framework like you typically have to in larger, opinionated framework's like Django and Rails.

See question on Quora

Posted on 28 April 2012

Does being proficient in more than one or two programming language(s) benefit one in the long run? Why or why not?


Yes, being a "polyglot programmer" benefits one in the long run.

Other languages give you other perspectives on how to solve problems, even if you stick to one language and its platform as the backbone of your career,

For large multi-language projects, understanding other languages will improve how you work with your team, even if you don't code in the other languages yourself.

See question on Quora

Posted on 26 February 2012

I want to learn to code Python and Django (web framework). What's the best way to start for a programming newbie?


Although I think Python is a better overall language, if you just want to slap a utilitarian web interface on some backend code for internal use then PHP might be a better language to learn. It's easier to setup on the server, will run on virtually any host, and is a more out of the box solution.

As for Python/Django:

If you have never programmed before, it's definitely worth learning Python before you get to Django. Someone with experience could skip to a Django book/tutorial and pickup Python on the way - it's a simple language with very clear, easy to read and understand code.

How long it takes you to learn what you need to know is highly variable. If you are just trying to write some automation scripts to help cut down some manual labor, then you can probably go from zero to this point in a few weeks (maybe 20-30 hours). If you want to write production quality web apps using Python/Django, it's going to take longer.

Setup The Environment

First download Python if you don't have it. http://www.python.org/getit/ I prefer Linux, but your MacBook will be more than sufficient as a dev machine.

Python is in a state of limbo between the 2.7 release version and 3. While 3 is the future, it introduces some intrinsic changes which many of the popular libraries do not yet support, Django included. Your best bet is to start with 2.7 and switch to Python 3 later. Also, most of the learning material available is still written for Python 2.

You can write code in any text editor. My favorite, and an up-and-coming basic code editor is Sublime Text. It is simple, elegant, and very functional. http://www.sublimetext.com/ It costs $59, but you can use it free for an unlimited amount of time (as of right now). Well worth buying though.

Many Mac developers love and swear by TextMate. It's more developed and further along than Sublime, I think. Costs $54, and has a 30-day trial.

If you get deeper into programming and want a full featured integrated development environment (IDE), then PyCharm is top notch. http://www.jetbrains.com/pycharm/ It costs $99 and has a yearly renewal fee for updates, but is worth it. Something like this has a much steeper learning curve than Sublime Text or TextMate, but they can save you time and keystrokes in the long run.

I'm going to assume you are familiar with working in the terminal, since you have IT experience. If not, this might be a good starting point: http://smokingapples.com/softwar...

Django apps can be run entirely on your own dev machine, but if you want to put it on the web to be accessed by others on your team, or from other machines you will need a host. There are some good questions on Quora about hosts, but ensure you choose one that allows Python and SSH access. I recommend finding a cheap Virtual Private Server (VPS), although this might be too steep a learning curve for someone without experience. (You say you've done a lot in the IT field, so some of this might be too basic for you, sorry).

I recommend learning and using Source Control. This helps manage your code revisions, and is particularly useful if you have more than one person working on it. I personally use Mercurial, but Git is more popular.

http://hginit.com/ is a good intro guide for Mercurial. http://learn.github.com/p/intro.... looks to be good for Git, but I haven't worked through it yet.

In addition to using Source Control, you'll need a source code repository (you'll learn what this means in one of those tutorials. GitHub (http://www.github.com) is the most popular, with BitBucket (http://www.bitbucket.org) coming in second. You can use Git on either, but GitHub does not support Mercurial. Also, BB has better options for free accounts - unlimited free repos, whereas GitHub limits you.

You might feel overwhelmed trying to learn how to program Python, learning Django, and trying to figure out source control and a myriad of tools all at once. In my opinion it's best to get down a version control workflow early on, rather than putting it off. You'll develop good habits early on that will help you down the stretch.

Where to Learn
There are a ton of resources for learning Python, and quite a few for Django. Be sure that whatever you choose, you go with resources that consistently use either Python 2 or 3. Also, stay away from small tutorials and stick with complete references. Learning from piecemeal tutorials will leave you with fragmented knowledge, and they are usually lower quality.

Here is a list of references taken from another Quora question. The key to learning how to program, in my opinion, is to practice a lot. So do the exercises these books contain, and do more programming on your own.

Online Tutorials & Ebooks
All free

Recommended: http://www.diveintopython.net/
http://docs.python.org/tutorial/
http://swaroopch.com/notes/Python
http://homepage.mac.com/s_lott/b...
Recommended: http://greenteapress.com/thinkpy... (A higher level look at programming with Python as the tool; highly recommended if you want to be a good programmer)
http://python.net/~goodger/proje...
http://learnpythonthehardway.com/

Videos

http://code.google.com/edu/langu...
http://www.youtube.com/user/then...
Recommended: http://ocw.mit.edu/courses/elect... (A higher level look at programming with Python as the tool; highly recommended if you want to be a good programmer)

Books
Sometimes having a physical book makes it easier for some people to learn. Many of the above ebooks are available in hard copy.

Dive Into Python
Think Python
Learn Python the Hard Way
A Byte of Python

How do I learn Python?

All of those are Python references. The online material available for Django is more sparse, but there are some good resources.

The Django Book is the starting point for most people: http://www.djangobook.com/

There is, of course, the official tutorial: https://docs.djangoproject.com/e... I found Django Book more useful. However, get very familiar with the Django docs. They are very good, and you will be spending a lot of time digging into them.

This is a highly recommended hardcopy book for learning, but I've not used it: https://www.packtpub.com/django-...

Prefer video? This series ought to be very good: http://teamtreehouse.com/library... I have not tried it yet either. There is a $25/mo fee for their service

Getting Assistance
Inevitably, when you are learning or attempting to build something, you're going to run into a brick wall at some point.

This is my workflow if I get stuck on a concept, or while programming:
Check the Documentation -> Check the Source Code -> Search Google -> Ask on StackOverflow

Asking is always a last resort, quite simply because figuring it out on my own gives more of a sense of pride and accomplishment, and I'm more likely to remember the solution.

Python Docs: http://docs.python.org/
Django Docs: https://docs.djangoproject.com/e...

See question on Quora

Posted on 15 December 2011

What are some cool Python tricks?


Create infinities
Infinity, and it's brother minus infinity, comes in handy once in a while.

1
2
3
4
5
6
7
my_inf = float('Inf')
99999999 > my_inf
-> False

my_neg_inf = float('-Inf')
my_neg_inf > -99999999
-> False


Intuitive comparisons
A great example of the simplicity of python syntax.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
x = 2
3 > x == 1
-> False
1 < x < 3
-> True
10 < 10*x < 30 
-> True
10 < x**5 < 30 
-> False
100 < x*100 >= x**6 + 34 > x <= 2*x <5
-> True


Enumerate it
Ever wanted to find that damn index when you're inside a loop?

1
2
3
4
5
6
mylist = [4,2,42]
for i, value in enumerate(mylist):
    print i, ': ', value
-> 0: 4
-> 1: 2
-> 2: 42


Reverse it
This has grown to become a part of my morning ritual. Reverse. Anywhere. Anytime. All the time.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Reverse the list itself:
mylist = [1,2,3]
mylist.reverse()
print mylist
-> [3,2,1]
# Iterate in reverse
for element in reversed([1,2,3]): print element
-> 3
-> 2
-> 1


Ultra compact list generating
Using nested list comprehensions you can save a great deal typing, while having fun impressing the girls.

1
2
[(x**2, y**2, x**2+y**2) for x in range(1,5) for y in range(1,5) if x<=y and x%2==0]
-> [(4, 4, 8), (4, 9, 13), (4, 16, 20), (16, 16, 32)]


NB! Crazy nesting should be used with extreme caution.
1
2
Readability > Girls
-> True


Splat call
'*' is called the splat operator, and may make you smile. It automagically unpacks stuff in a function call.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def foo(a, b, c):
    print a, b, c

mydict = {'a':1, 'b':2, 'c':3}
mylist = [10, 20, 30]

foo(*mydict)
-> a, b, c
foo(**mydict)
-> 1, 2, 3
foo(*mylist)
-> 10 20 30


The cute empty string trick
By using two single quotes ('') and a dot (.), we have access to all the builtin string functions. This can come in handy, you see.
1
2
''.join('I','Want','Just','One','String')
-> 'IWantJustOneString'


Itertools
The itertools module provide some useful and efficient functions for us. For example

1
2
3
4
5
from itertools import chain
''.join(('Hello ', 'Kiddo'))
-> 'Hello Kiddo'
''.join((x for x in chain('XCVGOHST', 'VSBNTSKFDA') if x == 'O' or x == 'K'))
-> 'OK'


When Las Vegas just isn't enough
Buy in some beer, invite a few (exactly 10) friends over, and copy/paste this sexy line of python code into your favorite interpreter. The rules are:
1. Press Enter
2. The one who gets the least stars have to CHUG CHUG CHUG!
3. Press the up arrow
4. Goto 1.

1
print "\n".join(str(i)+":\t"+"*"*randint(1,10) for i in range(1,11))


Update:

Make python enums
I like this enumification trick:
1
2
3
4
5
class PlayerRanking:
  Bolt, Green, Johnson, Mom = range(4)

PlayerRanking.Mom
-> 4


See question on Quora

Posted on 2 December 2011

What are some cool Python tricks?


List comprehensions and generator expressions

Instead of building a list with a loop:
1
2
3
4
b = []
for x in a:
    b.append(10 * x)
foo(b)

you can often build it much more concisely with a list comprehension:
1
foo([10 * x for x in a])

or, if
foo
accepts an arbitrarily iterable (which it usually will), a generator expression:
1
foo(10 * x for x in a)

Python 2.7 supports dict and set comprehensions, too:
1
2
3
4
>>> {x: 10 * x for x in range(5)}
{0: 0, 1: 10, 2: 20, 3: 30, 4: 40}
>>> {10 * x for x in range(5)}
set([0, 40, 10, 20, 30])


Fun tricks with
zip

Transposing a matrix:
1
2
3
>>> l = [[1, 2, 3], [4, 5, 6]]
>>> zip(*l)
[(1, 4), (2, 5), (3, 6)]

Dividing a list into groups of
n
:
1
2
3
>>> l = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8]
>>> zip(*[iter(l)] * 3)
[(3, 1, 4), (1, 5, 9), (2, 6, 5), (3, 5, 8)]


import 
this

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


See question on Quora

Posted on 1 December 2011

What can be done with Ruby or Python that just can't be done with PHP?


There's very little that cannot be done with PHP -- in fact it's not really an interesting question.

The more interesting question asks what things can't be done in PHP elegantly? Or simply? Or safely? Or quickly? Or shouldn't have to be done at all? These types of questions form much more useful metrics by which to compare general purpose programming languages

A good example of that last category btw is PHP's maddening tendency to output errors to the socket it's reading from. I realize that PHP was originally a templating language and that this may have made sense at one time, but having to prefix calls with @ to suppress otherwise important error messages because they go to stdout warps the mind just a bit.

See question on Quora

Posted on 25 June 2011

What are good books of advanced topics in Python?


If you haven't yet, I highly recommend reading "Think Python: How to Think Like a Computer Scientist" available free in PDF format at http://www.greenteapress.com/thi.... It is also available at Amazon for the actual book. It is different than most programming books I have read in that it focuses less on teaching a language and more on how to be a good programmer. You will learn from it, even if you're already pretty slick with Python.

You might also take a look at Google's Python Class - http://code.google.com/edu/langu.... It is geared toward the beginner Python coder, but might have some good lessons for you still. Most of it is taught through a series of videos, covering strings, lists and sorting, dictionaries and files, regular expressions, utilities, and urllib. At the very least, take a look through some of the exercises, as they are also the kind that make you think and do an excellent job at teaching good programming habits.

(Disclaimer: I haven't watched all of those videos yet, but everything I've seen so far is excellent.)

I forgot to mention: If you are interested in some non-book high quality learning material, MIT has an open courseware series covering Computer Science and Programming at http://ocw.mit.edu/courses/elect... . It starts out pretty basic, but some of the advanced topics include testing and debugging, object oriented programming, encapsulation and inheritance, and then some math oriented topics like a stock market simulation, normal and exponential distributions, linear regression.

Some of the skills it teaches are things you won't find in an ordinary programming book.

Finally (I swear), Natural Language Processing With Python might be of interest to you. http://www.nltk.org/book

See question on Quora

Posted on 19 January 2011


Personally I'd go with either Flask or Django since both of them are rather straight forward. The tutorial sections on both project sites (http://docs.djangoproject.com/en..., http://flask.pocoo.org/docs/tuto...) are IMO quite nicely written.

But if you already have a specific framework in mind, those general tutorials will only really help getting started and understand the basic concepts. For anything beyond that you'll have to check the rest of the respective documentation and probably also quite a few blogs, since - at least most of the time - current best practices aren't really represented in the official docs.

For example for a simplified version of this site (with people asking questions and having a profile), you'll probably want to have two applications (account, questions) where you create models for the questions, answers, user profiles etc.

Once you have your project ready for deployment you'll probably want to look into some deployment environments for WSGI like gunicorn, uwsgi, mod_wsgi for apache etc. as well as helper scripts scripts like fabric ( http://fabfile.org/ )  that make the whole re-deployment process much simpler.

See question on Quora

Posted on 17 January 2011

How do I become a data scientist?


Strictly speaking, there is no such thing as "data science" (see What is data science? ). See also: Vardi, Science has only two legs: http://portal.acm.org/ft_gateway...

Here are some resources I've collected about working with data, I hope you find them useful  (note: I'm an undergrad student, this is not an expert opinion in any way).

1) Learn about matrix factorizations

  • Take the Computational Linear Algebra course (it is sometimes called Applied Linear Algebra or Matrix Computations or Numerical Analysis or Matrix Analysis and it can be either CS or Applied Math course). Matrix  decomposition algorithms are fundamental to many data mining applications and are usually underrepresented in a standard "machine learning" curriculum. With TBs of data traditional tools such as Matlab become not suitable for the job, you cannot just run eig() on Big Data. Distributed matrix computation packages such as those included in Apache Mahout [1] are trying to fill this void but you need to understand how the numeric algorithms/LAPACK/BLAS routines [2][3][4][5] work in order to use them properly, adjust for special cases, build your own and scale them up to terabytes of data on a cluster of commodity machines.[6] Usually numerics courses are built upon undergraduate algebra and calculus so you should be good with prerequisites.  I'd recommend these resources for self study/reference material:
  • See Jack Dongarra : Courses and What are some good resources for learning about numerical analysis?

2) Learn about distributed computing


3) Learn about statistical analysis

  • I've found that learning statistics in a particular domain (e.g. Natural Language Processing) is much more enjoyable than taking Stats 101. My personal recommendation is the course by Michael Collins at Columbia (also available on Coursera).
  • You can also choose a field where the use of quantitative statistics and causality principles [7]  is inevitable, say molecular biology [8], or a fun sub-field such as cancer research [9], or even narrower domain, e.g. genetic analysis of tumor angiogenesis [10] and try answering important questions in that particular field, learning what you need in the process.

4) Learn about optimization


5) Learn about machine learning


6) Learn about information retrieval


7) Learn about signal detection and estimation


8) Master algorithms and data structures


9) Practice


If you do decide to go for a Masters degree:

10) Study Engineering

I'd go for CS with a focus on either IR or Machine Learning or a combination of both and take some systems courses along the way. As a "data scientist" you will have to write a ton of code and probably develop distributed algorithms/systems to process massive amounts of data. MS in Statistics will teach you how to do modeling and regression analysis etc, not how to build systems, I think the latter is more urgently needed these days as the old tools become obsolete with the avalanche of data. There is a shortage of engineers who can build a data mining system from the ground up. You can pick up statistics from books and experiments with R (see item 3 above) or take some statistics classes as a part of your CS studies.

Good luck.

[1] http://mahout.apache.org/
[2] http://www.netlib.org/lapack/
[3] http://www.netlib.org/eispack/
[4] http://math.nist.gov/javanumeric...
[5] http://www.netlib.org/scalapack/
[6] http://labs.google.com/papers/ma...
[7] Amazon.com: Causality: Models, Reasoning and Inference (9780521895606): Judea Pearl: Books
[8] Introduction to Biology , MIT 7.012 video lectures
[9] Hanahan & Weinberg, The Hallmarks of Cancer, Next Generation: Page on Wisc
[10] The chaotic organization of tumor-associated vasculature, from The Biology of Cancer: Robert A. Weinberg: 9780815342205: Amazon.com: Books, p. 562


See question on Quora

Posted on 25 August 2010

What are common uses of Python decorators?


Decorators are convenient for factoring out common prologue, epilogue, and/or exception-handling code in similar functions (much like context managers and the "with" statement), such as:
  • Acquiring and releasing locks (e.g. a "@with_lock(x)" decorator)
  • Entering a database transaction (and committing if successful, or rolling back upon encountering an unhandled exception)
  • Asserting pre- or post-conditions (e.g. "@returns(int)")
  • Parsing arguments or enforcing authentication (especially in web application servers like Pylons where there's a global request and/or cookies object that might accompany formal parameters to a function)
  • Instrumentation, timing or logging, e.g. tracing every time a function runs
They are also used as shorthand to define class methods (@classmethod) and static methods (@staticmethod) in Python classes.

See question on Quora

Posted on 19 January 2010

Why is the programming language Python called Python?


"At the time when he began implementing Python, Guido van Rossum was also reading the published scripts from "Monty Python's Flying Circus" (a BBC comedy series from the seventies, in the unlikely case you didn't know). It occurred to him that he needed a name that was short, unique, and slightly mysterious, so he decided to call the language Python."

From the Python FAQ
http://www.python.org/doc/faq/ge...

See question on Quora

Posted on 18 December 2009

reddit.com: search results

When do you *NOT* use python?

Hi everyone,

We're all python fans here, and to be fair I may use it a bit more than I should. I'd like to hear other people's thoughts on which tasks they want to solve in a non-python language and which one they'd choose for that job.

Thanks in advance...

submitted by RealityShowAddict to Python
[link] [421 comments]

Posted on 16 February 2015

python.org should stop steering web visitors away from v3 docs

$ curl --head https://docs.python.org/library/string.html HTTP/1.1 301 Moved Permanently Date: Fri, 06 Feb 2015 23:22:21 GMT Server: nginx Content-Type: text/html Location: https://docs.python.org/2/library/string.html 

This is another contributing factor to why v3 adoption is slow, and new users are confused. This configuration affects everything from StackOverflow links (how I first noticed it) to Google pagerank.

It's why Python3 docs don't often show up in search results.

docs.python.org should default to v3. Or, at the very least, display a disambiguation page, a la Wikipedia.

submitted by caninestrychnine to Python
[link] [68 comments]

Posted on 6 February 2015

What do you *not* like using Python for?

Maybe sounds like a silly question, but here's the context: Been programming for ~10 years, professionally for the past 7. Matlab, C#, C++ (in decreasing order of proficiency). Per management, it looks like I'll now be getting into some Python for an upcoming project... which is cool, as with how prevalent Python seems to be, I've wanted to get my feet wet for a while.

Obviously all languages have their bounds... or at least things they do better than others. So - as I'm getting my feet wet here, does anything stand out as far as areas where Python is weak and there may be better alternatives?

submitted by therealjerseytom to Python
[link] [354 comments]

Posted on 18 October 2014

Python subreddit has largest subscriber base of any programming language subreddit (by far).

Python 80,220 (learnpython 26,519) Javascript 51,971 Java 33,445 PHP 31,699 AndroidDev 29,483 Ruby 24,433 C++ 22,920 Haskell 17,372 C# 14,983 iOS 13,823 C 11,602 Go 10,661 .NET 9,141 Lisp 8,996 Perl 8,596 Clojure 6,748 Scala 6,602 Swift 6,394 Rust 5,688 Erlang 3,793 Objective-C 3,669 Scheme 3,123 Lua 3,100 "Programming" 552,126 "Learn Programming" 155,185 "CompSci" 73,677 
submitted by RaymondWies to Python
[link] [120 comments]

Posted on 21 September 2014

What are the top 10 built-in Python modules that a new Python programmer needs to know in detail?

I'm fairly new to Python but not to Programming. With the programming languages that I've learned in the past I always see a recurring pattern — some libraries (modules) are more often used than others.

It's like the Pareto Principle (80/20 rule), which states that 80 of the outputs (or source code) will come from 20 of the inputs (language constructs/libraries).

That being said, I would like to ask the skilled Python veterans here on what they think are the top 10 most used built-in modules in a typical Python program, which a beginner Python programmer like me would benefit to know in detail?

EDIT:

Thanks to all that have replied :)

I found a site where I can study most of the modules that you suggested:

(Python Module of the Week)

Contents: http://pymotw.com/2/contents.html

Index: http://pymotw.com/2/py-modindex.html

Of course, there is no substitute for the official documentation when it comes to detailed information:

Python 2.7.*: https://docs.python.org/2/library/index.html

Python 3.4.*: https://docs.python.org/3.4/library/index.html

submitted by ribbon_tornado to Python
[link] [135 comments]

Posted on 24 June 2014

I made my first ever thing in Python, and am really proud of it.

I'm not sure if this is more appropriately posted to /r/learnpython, but I'm new here and new at programming and I just really felt like sharing my excitement with someone! I've been interested in computers since almost forever, and have more recently been trying to actually learn some coding (a little html and python, just the basics). After figuring out how to use Python bit I got Portable Python and sat myself down with the project of creating a 2 player game of Tic Tac Toe, and I did.

I've never really done anything on this level with a computer before and I just feel like I've opened up a door to a whole new world! I feel powerful for what I've done and I can't wait to do more.

Here is my little program if you guys are interested in seeing what an awful newb's poorly documented code looks like.

And happy coding to all! :D

submitted by DiscyD3rp to Python
[link] [127 comments]

Posted on 13 May 2014

Learning python earned me a 50% raise, and it's about to again.

(Sorry for the throwaway, but I wanted to be able to answer questions honestly without any hesitation.)

I've been in IT since I was 17 in 1999. I started off at a help desk, and worked my way up to a Systems Administrator where I was making 60k USD/yr. (I currently have only an associates degree with no plans to go back to school.) I was primarily a Windows domain/ network admin, with a few *nix boxes spread throughout. I had known windows batch scripting, and way back in the day had programmed in BASIC before the world was.

I had tossed around the idea of learning a programming language before, but when asked I'd often say "Developers' brains just work differently than mine. I'm not a coder." Programming seemed so abstract and I couldn't really wrap my head around it. I finally decided though, to try something.

It was 2010 and I had heard a lot of Ruby on Rails and thought that Ruby would be a great language to learn. I ran through the tutorial of making a polls app at least 5 times, but I just couldn't wrap my head around it. So I gave up.

One year later I heard about python. Despite all the negative talk about python while googling for "python vs ruby vs php vs ..." (GIL, speed, whitespace, duck typing, (not that I knew what ANY of that meant anyway)) I decided that I really wanted to give it a shot. I started out with codeacademy to get my feet wet, I'd tinker with idle while my wife and I would watch netflix after the kids went to bed. Then I started dreaming in code.

Have you ever had "work dreams"? The kind you have for about 2 weeks after starting a new job that's really hard? That was python for me. Being primarily in a Windows environment it was hard to find anything for python to do initially at work. My boss didn't program, and really didn't see the value in it. Then one day I found myself needing to compare a list of files. I needed to find all the files that were in one column but not in the other. I had them in excel and after working through a formula I had my answer, but I hated it. All I wanted to do was write something like--

select name from column1 where name not in (select name from column2); 

Enter python and sqlite. It probably took me about 3 hours to figure it out, but I imported a csv into a sqlite table in python so I could query it. BAM! I was hooked from then on.

Every night I would tinker, read, and play. I found tons of things to automate at work, making my time so much more effective. I loved it. I became a python evangelist. I'd like to say that my boss was impressed, but really he never came around, and it frustrated me. Fast forward a year.

I had heard about the DevOps movement and though I didn't understand it completely at the time I thought that being a Developer and Systems Admin mutant sounded like a lot of fun, and something I could really be good at.

After having a rough time with my boss one day I decided to check the local classifieds. I saw an ad for a DevOps Admin. Basically this guy needed to know hardware, networking, provisioning, something called puppet, and one of three scripting languages- ruby, bash, or python.

I looked at puppet, and after having learned about booleans and strings and syntax from python, picking it up wasn't a problem. I got hired on the spot for $90k USD. A clean 50% raise. I use python every single day. I write scripts to check if databases back up properly, if services are up, if all 1000 of my physical servers are getting their updates, to provision RAIDs, you name it. I integrate what I write into puppet, fabric, and a host of other tools that I've learned along the way.

After doing that for a little over a year now, I'm about to hire 2 guys under me as we expand and I'm moving up to $120k USD. I'm learning django for fun and am just starting into machine learning. I check out /r/python every day, you guys have been so helpful to me along my way. And if I can learn python, anybody can!!!

TL;DR I learned python in a year and got a 50% raise. 1 year later I got another 25% raise, all from python!

edit: percentages, oh math...

submitted by self_made_sysad to Python
[link] [142 comments]

Posted on 6 May 2014

What is the best part of python you wish people knew about?

I just quit my job at a major software company to be with a startup in downtown seattle and it looks like our stack is Python based. I'm new to Python but I want to learn fast; So please, let me what you like the most (or hate the most?) about python, other python developers code, etc so I can take all the good and not use the bad as I learn this new language.

Who knows, maybe you will need to maintain my code someday, so you could only be helping yourself!

Thanks in advance!

submitted by honestduane to Python
[link] [228 comments]

Posted on 16 December 2013

Eric Idle here. I've brought John Cleese, Terry Gilliam, Terry Jones and Michael Palin with me. We are Monty Python. AUA.

Hello everybody. I had so much fun last November doing my previous reddit AMA that I decided to return. I'm sure you've seen the exciting news, but here we are to confirm it, officially: Monty Python is reunited. Today is the big day and as you can imagine it's a bit of a circus round here, but we'll be on reddit from 9am for ninety minutes or so to take your questions. We'll be alternating who's answering, but everyone will be here!:

  • J0hnCleese
  • Terry_Gilliam
  • TerryJonesHere
  • _MichaelPalin

Proof: https://twitter.com/EricIdle/status/403525056740851714

Update: We're running a little late but will be with you 10-15 minutes!

Update 2: The url for tickets - http://www.montypythonlive.com - available Monday

Update 3: Thank you for all the questions. We tried to answer as many as we could. Thanks everyone!

submitted by ericidle to IAmA
[link] [7735 comments]

Posted on 21 November 2013

What you do not like in Python?

I'm a big fun of Python! I use it every day! But there are things which are annoying, strange and so forth in Python (you really don't like it). If any, please, share your thoughts. For example:

  • built-in set type has method like symmetric_difference_update. I don't like so long methods in built-in types.
submitted by krasoffski to Python
[link] [892 comments]

Posted on 18 September 2013

Python interview questions

I'm about to go to my first Python interview and I'm compiling a list of all possible interview questions. Based on resources that I've found here, here and here I noted down the following common questions, what else should I add?

easy/intermediate

  • What are Python decorators and how would you use them?
  • How would you setup many projects where each one uses different versions of Python and third party libraries?
  • What is PEP8 and do you follow its guidelines when you're coding?
  • How are arguments passed – by reference of by value? (easy, but not that easy, I'm not sure if I can answer this clearly)
  • Do you know what list and dict comprehensions are? Can you give an example?
  • Show me three different ways of fetching every third item in the list
  • Do you know what is the difference between lists and tuples? Can you give me an example for their usage?
  • Do you know the difference between range and xrange?
  • Tell me a few differences between Python 2.x and 3.x?
  • The with statement and its usage.
  • How to avoid cyclical imports without having to resort to imports in functions?
  • what's wrong with import all?
  • Why is the GIL important? (This actually puzzles me, don't know the answer)
  • What are "special" methods (<foo>), how they work, etc
  • can you manipulate functions as first-class objects?
  • the difference between "class Foo" and "class Foo(object)"

tricky, smart ones

  • how to read a 8GB file in python?
  • what don't you like about Python?
  • can you convert ascii characters to an integer without using built in methods like string.atoi or int()? curious one

subjective ones

  • do you use tabs or spaces, which ones are better?

Ok, so should I add something else or is the list comprehensive?

submitted by dante9999 to Python
[link] [187 comments]

Posted on 19 August 2013

Python saved my ass tonight.

It's Friday night, and I'm stuck at work because Apache isn't working, and without it, I can't serve the files I need to update the embedded device I'm working on. So on a whim, I googled "python fileserver", and this little gem popped up:

python -m SimpleHTTPServer 

Running that from the directory I needed to grab files from saved me the time of debugging Apache (aka my worst nightmare), and, possibly by extension, my job. Thanks Python!

submitted by LightWolfCavalry to Python
[link] [85 comments]

Posted on 9 August 2013

Common misconceptions in Python

What are some common misconceptions that people have when programming in Python? Here are a couple that were passed around a mailing list I'm on:


'list.sort' returns the sorted list. (Wrong: it actually returns None.)


Misconception: The Python "is" statement tests for equality.

Reality: The "is" statement checks to see if two variables point to the same object.

This one is especially nasty, because for many cases, it "works", until it doesn't :)

In [1]: a = 'hello'

In [2]: b = 'hello'

In [3]: a is b

Out[3]: True

In [4]: a = 'hello world!'

In [5]: b = 'hello world!'

In [6]: a is b

Out[6]: False

In [7]: a = 3

In [8]: b = 3

In [9]: a is b

Out[9]: True

In [10]: a = 1025

In [11]: b = 1025

In [12]: a is b

Out[12]: False

This happens because the CPython implementation caches small integers and strings, so the underlying objects really are the same, sometimes.

If you want to check if two objects are equivalent, you must always use the == operator.


submitted by rhiever to Python
[link] [243 comments]

Posted on 13 May 2013

What is Python not a good language for?

I am moving from writing one-off code and scripts to developing tools which are going to be used by a larger group. I am having trouble deciding if Python is the right tool for the jobs.

For example I am responsible for process a 1gb text file into some numerical results. Python was the obvious choice for reading the text file but I am wondering if Python is fast enough for production code.

Edit: Thanks for the all responses. I will continue to learn and develop in Python.

submitted by Hopemonster to Python
[link] [230 comments]

Posted on 6 May 2013

Why do you choose Python over other language?

Hi, coding newbie here, I want to know why do you prefer Python over other language and it pro's and con's. Really interesed into learning Python, any tips?

Edit: Wow, such a great feedback, as I see the main Pro is the overall badass community that Python has behind (refer to all the comments in this thread), thanks guys.

Edit 2: The question now. Python 2.x or 3.x?

submitted by Rokxx to Python
[link] [165 comments]

Posted on 5 April 2013

I use PHP. Whenever I meet a Python guy, they tell me how much better it is, and I'd like the low-down on the reasons.

I'm not bothered with the fact that PHP was not designed and has inconsistencies etc., because I know my way around it well enough that it doesn't matter. I'm curious whether using Python would help me, as I don't hear much negativity around it.

What I want to know is, in terms of web dev, are there things Python can do that PHP can't? Is the language so much better that I'll be able to write better code in less time? Is it as fast as PHP, and are the frameworks as varied and battle tested? Are there any shortcomings to Python that would trip me up?

Thanks guys.

submitted by maloney7 to Python
[link] [207 comments]

Posted on 19 November 2011

Are there any things about Python that you do *not* like, or that you wish were done differently, or that you flat out think are wrong?

I lightheartedly joked in another thread that if the person had agreed with my point (that Python 3 seems very slightly harder to code in than Python 2.x - also a lighthearted, almost completely unfounded critique), that it would be the first time I'd ever seen any Python user online agree with any criticism of any part of the language. In this last bit I'm not really joking.

I had many newbie critiques a few years ago - 'self', the fact that you can't join a string list with myList.join(', '), something about slicing that I forget now, that it was confusing which things worked in-place, and which worked on a copy, etc. - and in a forum (not reddit) where I posted up my lengthy list (mostly to see what people thought of these things), I was met with a wall of responses, all strongly in favor of every last part of all of it, and even of things I hadn't mentioned. In 3 years I realize now I have never once seen anyone critique any part of the language and not be met with all manner of deep, philosophical justifications as to why that thing or those things must be that way.

It's the perfect language, I guess.

So my new question is just straight up: IS there anything about Python you don't like? I mean, it is moving to 3, and there are changes, so clearly 2.x had room for improvement, so let's hear it. Be prepared for a battle on all fronts from everyone else in here, though, whatever you say :) I'd love to hear from the real experts, the people who usually wield seemingly powerful reasoning and long strings of computer science words in their arguments.

This itself isn't a critique, nor even a jab, but just another attempt to learn more.

submitted by gfixler to Python
[link] [576 comments]

Posted on 16 November 2011

A website that lets you browse Reddit like you're reading/coding in Python!

...or Java (and soon, Ruby, PHP, C#, etc.).

It's my first website with Flask (my first real dynamic website?). I wanted the domain to be coderedd.it, but it was too expensive :(. So I just asked my brother to help me host it.

Comments appreciated. :)

r.doqdoq.com

UPDATES:

  • NSFW indicator for Python (can't figure out where/how to place it in Java, but it still checks for NSFW so it won't load image previews)
  • don't preload all images (thanks to canuckkat)
  • use def instead of class in Python

UPDATES 2:

I just opened up the repo at bitbucket https://bitbucket.org/john2x/rdoqdoq :)

Thanks everyone!

submitted by ares623 to Python
[link] [73 comments]

Posted on 6 September 2011