Fixed typos and articles.
This commit is contained in:
parent
8bfa40e87f
commit
6cce9c05e3
@ -60,15 +60,15 @@ Introduction
|
||||
|
||||
RabbitMQ is a message broker. The principle idea is pretty simple: it accepts
|
||||
and forwards messages. You can think about it as a post office: when you send
|
||||
mail to the post box and you're pretty sure that Mr. Postman will eventually
|
||||
mail to the post box you're pretty sure that Mr. Postman will eventually
|
||||
deliver the mail to your recipient. Using this metaphor RabbitMQ is a post box,
|
||||
post office and a postman.
|
||||
a post office and a postman.
|
||||
|
||||
The major difference between RabbitMQ and the post office is the fact that it
|
||||
doesn't deal with the paper, instead it accepts, stores and forwards binary
|
||||
blobs of data - _messages_.
|
||||
|
||||
RabbitMQ uses a weird jargon, but it's simple once you'll get it. For example:
|
||||
RabbitMQ uses a specific jargon. For example:
|
||||
|
||||
* _Producing_ means nothing more than sending. A program that sends messages
|
||||
is a _producer_. We'll draw it like that, with "P":
|
||||
@ -85,7 +85,7 @@ RabbitMQ uses a weird jargon, but it's simple once you'll get it. For example:
|
||||
is not bound by any limits, it can store how many messages you
|
||||
like - it's essentially an infinite buffer. Many _producers_ can send
|
||||
messages that go to the one queue, many _consumers_ can try to
|
||||
receive data from one _queue_. Queue'll be drawn as like that, with
|
||||
receive data from one _queue_. A queue will be drawn as like that, with
|
||||
its name above it:
|
||||
|
||||
{% dot -Gsize="10,0.9" -Grankdir=LR%}
|
||||
@ -98,7 +98,7 @@ RabbitMQ uses a weird jargon, but it's simple once you'll get it. For example:
|
||||
}
|
||||
{% enddot %}
|
||||
|
||||
* _Consuming_ has a similar meaning to receiving. _Consumer_ is a program
|
||||
* _Consuming_ has a similar meaning to receiving. A _consumer_ is a program
|
||||
that mostly waits to receive messages. On our drawings it's shown with "C":
|
||||
|
||||
{% dot -Gsize="10,0.3" -Grankdir=LR %}
|
||||
@ -136,10 +136,10 @@ messages from that queue.
|
||||
|
||||
> #### RabbitMQ libraries
|
||||
>
|
||||
> RabbitMQ speaks AMQP protocol. To use Rabbit you'll need a library that
|
||||
> understands the same protocol as Rabbit. There is a choice of libraries
|
||||
> for almost every programming language. Python it's not different and there is
|
||||
> a bunch of libraries to choose from:
|
||||
> RabbitMQ speaks a protocol called AMQP. To use Rabbit you'll need a library
|
||||
> that understands the same protocol as Rabbit. There is a choice of libraries
|
||||
> for almost every programming language. For python it's not different and there
|
||||
> is a bunch of libraries to choose from:
|
||||
>
|
||||
> * [py-amqplib](http://barryp.org/software/py-amqplib/)
|
||||
> * [txAMQP](https://launchpad.net/txamqp)
|
||||
@ -202,12 +202,12 @@ the message will be delivered, let's name it _test_:
|
||||
|
||||
|
||||
At that point we're ready to send a message. Our first message will
|
||||
just contain a string _Hello World!_, we want to send it to our
|
||||
just contain a string _Hello World!_ and we want to send it to our
|
||||
_test_ queue.
|
||||
|
||||
In RabbitMQ a message never can be send directly to the queue, it always
|
||||
In RabbitMQ a message can never be send directly to the queue, it always
|
||||
needs to go through an _exchange_. But let's not get dragged by the
|
||||
details - you can read more about _exchanges_ in [third part of this
|
||||
details - you can read more about _exchanges_ in [the third part of this
|
||||
tutorial]({{ python_three_url }}). All we need to know now is how to use a default exchange
|
||||
identified by an empty string. This exchange is special - it
|
||||
allows us to specify exactly to which queue the message should go.
|
||||
@ -246,7 +246,7 @@ them on the screen.
|
||||
Again, first we need to connect to RabbitMQ server. The code
|
||||
responsible for connecting to Rabbit is the same as previously.
|
||||
|
||||
Next step, just like before, is to make sure that the
|
||||
The next step, just like before, is to make sure that the
|
||||
queue exists. Creating a queue using `queue_declare` is idempotent - we can
|
||||
run the command as many times you like, and only one will be created.
|
||||
|
||||
@ -258,7 +258,7 @@ You may ask why to declare the queue again - we have already declared it
|
||||
in our previous code. We could have avoided that if we were sure
|
||||
that the queue already exists. For example if `send.py` program was
|
||||
run before. But we're not yet sure which
|
||||
program to run as first. In such case it's a good practice to repeat
|
||||
program to run first. In such cases it's a good practice to repeat
|
||||
declaring the queue in both programs.
|
||||
|
||||
> #### Listing queues
|
||||
|
@ -19,19 +19,19 @@ digraph G {
|
||||
}
|
||||
{% enddot %}
|
||||
|
||||
In [previous part]({{ python_two_url}}) of this tutorial we've learned how
|
||||
to create a task queue. The core assumption behind a task queue is that a task
|
||||
In [previous part]({{ python_two_url}}) of this tutorial we created
|
||||
a task queue. The core assumption behind a task queue is that a task
|
||||
is delivered to exactly one worker. In this part we'll do something completely
|
||||
different - we'll try to deliver a message to multiple consumers. This
|
||||
pattern is known as "publish-subscribe".
|
||||
|
||||
To illustrate this this tutorial, we're going to build a simple
|
||||
logging system. It will consist of two programs - first will emit log
|
||||
messages and second will receive and print them.
|
||||
To illustrate this, we're going to build a simple
|
||||
logging system. It will consist of two programs - the first will emit log
|
||||
messages and the second will receive and print them.
|
||||
|
||||
In our logging system every running copy of the receiver program
|
||||
will be able to get the same messages. That way we'll be able to run one
|
||||
receiver and direct the logs to disk, in the same time we'll be able to run
|
||||
will get the same messages. That way we'll be able to run one
|
||||
receiver and direct the logs to disk; and at the same time we'll be able to run
|
||||
another receiver and see the same logs on the screen.
|
||||
|
||||
Essentially, emitted log messages are going to be broadcasted to all
|
||||
@ -42,14 +42,14 @@ Exchanges
|
||||
---------
|
||||
|
||||
In previous parts of the tutorial we've understood how to send and
|
||||
receive messages. Now it's time to introduce the full messaging model
|
||||
in Rabbit.
|
||||
receive messages to and from a queue. Now it's time to introduce
|
||||
the full messaging model in Rabbit.
|
||||
|
||||
Let's quickly remind what we've learned:
|
||||
Let's quickly cover what we've learned:
|
||||
|
||||
* _Producer_ is user application that sends messages.
|
||||
* _Queue_ is a buffer that stores messages.
|
||||
* _Consumer_ is user application that receives messages.
|
||||
* A _producer_ is user application that sends messages.
|
||||
* A _queue_ is a buffer that stores messages.
|
||||
* A _consumer_ is user application that receives messages.
|
||||
|
||||
|
||||
The core idea in the messaging model in Rabbit is that the
|
||||
@ -62,7 +62,7 @@ exchange is a very simple thing. On one side it receives messages from
|
||||
producers and the other side it pushes them to queues. The exchange
|
||||
must know exactly what to do with a received message. Should it be
|
||||
appended to a particular queue? Should it be appended to many queues?
|
||||
Or should it get silently discarded. The exact rules for that are
|
||||
Or should it get discarded. The exact rules for that are
|
||||
defined by the _exchange type_.
|
||||
|
||||
{% dot -Gsize="10,1.3" -Grankdir=LR %}
|
||||
@ -107,8 +107,8 @@ queues it knows. And that's exactly what we need for our logger.
|
||||
> amq.headers headers
|
||||
> ...done.
|
||||
>
|
||||
> You can see a few `amq.` exchanges. They're created by default, but with a
|
||||
> bit of luck you'll never need to use them.
|
||||
> You can see a few `amq.` exchanges. They're created by default, but
|
||||
> chances are you'll never need to use them.
|
||||
|
||||
<div></div>
|
||||
|
||||
@ -123,7 +123,7 @@ queues it knows. And that's exactly what we need for our logger.
|
||||
> routing_key='test',
|
||||
> body=message)
|
||||
>
|
||||
> The _empty string_ exchange is special: message is
|
||||
> The _empty string_ exchange is special: messages are
|
||||
> routed to the queue with name specified by `routing_key`.
|
||||
|
||||
|
||||
@ -131,15 +131,16 @@ queues it knows. And that's exactly what we need for our logger.
|
||||
Temporary queues
|
||||
----------------
|
||||
|
||||
As you may remember previously we were using queues which had a specified name -
|
||||
`test` in first `task_queue` in second tutorial. Being able to name a
|
||||
As you may remember previously we were using queues which had a specified name
|
||||
(remember `test` and `task_queue`?). Being able to name a
|
||||
queue was crucial for us - we needed to point the workers to the same
|
||||
queue. Essentially, giving a queue a name is important when you don't
|
||||
want to loose any messages when the consumer disconnects.
|
||||
queue. Essentially, giving a queue a name is important when you
|
||||
want to share the queue between multiple consumers.
|
||||
|
||||
But that's not true for our logger. We do want to hear only about
|
||||
currently flowing log messages, we do not want to hear the old
|
||||
ones. To solve that problem we need two things.
|
||||
But that's not true for our logger. We do want to hear about all
|
||||
currently flowing log messages, we do not want to receive only a subset
|
||||
of messages. We're also interested only in currently flowing messages
|
||||
not in the old ones. To solve that we need two things.
|
||||
|
||||
First, whenever we connect to Rabbit we need a fresh, empty queue. To do it
|
||||
we could create a queue with a random name, or, even better - let server
|
||||
@ -179,7 +180,7 @@ digraph G {
|
||||
|
||||
|
||||
We've already created a fanout exchange and a queue. Now we need to
|
||||
tell the exchange to send messages to our queue. That relationship,
|
||||
tell the exchange to send messages to our queue. That relationship
|
||||
between exchange and a queue is called a _binding_.
|
||||
|
||||
{% highlight python %}
|
||||
@ -193,7 +194,7 @@ our queue.
|
||||
> #### Listing bindings
|
||||
>
|
||||
> You can list existing bindings using, you guessed it,
|
||||
> `rabbitmqctl list_bindings` command.
|
||||
> `rabbitmqctl list_bindings`.
|
||||
|
||||
|
||||
Putting it all together
|
||||
@ -226,9 +227,9 @@ digraph G {
|
||||
|
||||
The producer program, which emits log messages, doesn't look much
|
||||
different than in previous tutorial. The most important change is
|
||||
that, we now need to publish messages to `logs` exchange instead of
|
||||
the nameless one. We need to supply the `routing_key` parameter, but
|
||||
it's value is ignored for `fanout` exchanges. Here goes the code for
|
||||
that, we now need to publish messages to the `logs` exchange instead of
|
||||
the nameless one. We need to supply a `routing_key`, but
|
||||
its value is ignored for `fanout` exchanges. Here goes the code for
|
||||
`emit_log.py` script:
|
||||
|
||||
{% highlight python linenos=true %}
|
||||
@ -249,9 +250,10 @@ it's value is ignored for `fanout` exchanges. Here goes the code for
|
||||
{% endhighlight %}
|
||||
[(emit_log.py source)]({{ examples_url }}/python/emit_log.py)
|
||||
|
||||
As you see, we avoided declaring exchange. If the `logs` exchange
|
||||
As you see, we avoided declaring exchange here. If the `logs` exchange
|
||||
isn't created at the time this code is executed the message will be
|
||||
lost. That's okay for us.
|
||||
lost. That's okay for us - if no consumer is listening yet (ie:
|
||||
the exchange hasn't been created) we can discard the message.
|
||||
|
||||
The code for `receive_logs.py`:
|
||||
|
||||
|
@ -16,8 +16,8 @@ digraph G {
|
||||
{% enddot %}
|
||||
|
||||
|
||||
In the [first part of this tutorial]({{ python_one_url }}) we've learned how to send
|
||||
and receive messages from a named queue. In this part we'll create a
|
||||
In the [first part of this tutorial]({{ python_one_url }}) we've learned how
|
||||
to send and receive messages from a named queue. In this part we'll create a
|
||||
_Task Queue_ that will be used to distribute time-consuming work across multiple
|
||||
workers.
|
||||
|
||||
@ -25,7 +25,8 @@ The main idea behind Task Queues (aka: _Work Queues_) is to avoid
|
||||
doing resource intensive tasks immediately. Instead we schedule a task to
|
||||
be done later on. We encapsulate a _task_ as a message and save it to
|
||||
the queue. A worker process running in a background will pop the tasks
|
||||
and eventually execute the job.
|
||||
and eventually execute the job. When you run many workers the work will
|
||||
be shared between them.
|
||||
|
||||
This concept is especially useful in web applications where it's
|
||||
impossible to handle a complex task during a short http request
|
||||
@ -38,16 +39,17 @@ Preparations
|
||||
In previous part of this tutorial we were sending a message containing
|
||||
"Hello World!" string. Now we'll be sending strings that
|
||||
stand for complex tasks. We don't have any real hard tasks, like
|
||||
image to be resized or pdf files to be rendered, so let's fake it by just
|
||||
images to be resized or pdf files to be rendered, so let's fake it by just
|
||||
pretending we're busy - by using `time.sleep()` function. We'll take
|
||||
the number of dots in the string as a complexity, every dot will
|
||||
account for one second of "work". For example, a fake task described
|
||||
by `Hello!...` string will take three seconds.
|
||||
by `Hello...` string will take three seconds.
|
||||
|
||||
We need to slightly modify our `send.py` code, to allow sending
|
||||
arbitrary messages from command line:
|
||||
We need to slightly modify the _send.py_ code from previous example,
|
||||
to allow sending arbitrary messages from the command line. This program
|
||||
will schedule tasks to our work queue, so let's name it `new_task.py`:
|
||||
|
||||
{% highlight python %}
|
||||
{% highlight python linenos=true linenostart=12 %}
|
||||
import sys
|
||||
message = ' '.join(sys.argv[1:]) or "Hello World!"
|
||||
channel.basic_publish(exchange='', routing_key='test',
|
||||
@ -56,10 +58,11 @@ arbitrary messages from command line:
|
||||
{% endhighlight %}
|
||||
|
||||
|
||||
Our `receive.py` script also requires some changes: it needs to fake a
|
||||
second of work for every dot in the message body:
|
||||
Our old _receive.py_ script also requires some changes: it needs to fake a
|
||||
second of work for every dot in the message body. It will pop messages
|
||||
from the queue and do the task, so let's call it `worker.py`:
|
||||
|
||||
{% highlight python %}
|
||||
{% highlight python linenos=true linenostart=13 %}
|
||||
import time
|
||||
|
||||
def callback(ch, method, header, body):
|
||||
@ -76,7 +79,7 @@ One of the advantages of using Task Queue is the
|
||||
ability to easily parallelize work. If we have too much work for us to
|
||||
handle, we can just add more workers and scale easily.
|
||||
|
||||
First, let's try to run two `worker.py` scripts in the same time. They
|
||||
First, let's try to run two `worker.py` scripts at the same time. They
|
||||
will both get messages from the queue, but how exactly? Let's
|
||||
see.
|
||||
|
||||
@ -86,8 +89,6 @@ script. These consoles will be our two consumers - C1 and C2.
|
||||
shell1$ ./worker.py
|
||||
[*] Waiting for messages. To exit press CTRL+C
|
||||
|
||||
|
||||
|
||||
shell2$ ./worker.py
|
||||
[*] Waiting for messages. To exit press CTRL+C
|
||||
|
||||
@ -109,8 +110,6 @@ Let's see what is delivered to our workers:
|
||||
[x] Received 'Third message...'
|
||||
[x] Received 'Fifth message.....'
|
||||
|
||||
|
||||
|
||||
shell2$ ./worker.py
|
||||
[*] Waiting for messages. To exit press CTRL+C
|
||||
[x] Received 'Second message..'
|
||||
@ -128,27 +127,27 @@ Doing our tasks can take a few seconds. You may wonder what happens if
|
||||
one of the consumers got hard job and has died while doing it. With
|
||||
our current code once RabbitMQ delivers message to the customer it
|
||||
immediately removes it from memory. In our case if you kill a worker
|
||||
we will loose the message it was just processing. We'll also loose all
|
||||
we will lose the message it was just processing. We'll also lose all
|
||||
the messages that were dispatched to this particular worker and not
|
||||
yet handled.
|
||||
|
||||
But we don't want to loose any task. If a workers dies, we'd like the task
|
||||
But we don't want to lose any task. If a workers dies, we'd like the task
|
||||
to be delivered to another worker.
|
||||
|
||||
In order to make sure a message is never lost, RabbitMQ
|
||||
supports message _acknowledgments_. It's an information,
|
||||
supports message _acknowledgments_. It's a bit of data,
|
||||
sent back from the consumer which tells Rabbit that particular message
|
||||
had been received, fully processed and that Rabbit is free to delete
|
||||
it.
|
||||
|
||||
If consumer dies without sending ack, Rabbit will understand that a
|
||||
message wasn't processed fully and will redispatch it to another
|
||||
message wasn't processed fully and will redelivered it to another
|
||||
consumer. That way you can be sure that no message is lost, even if
|
||||
the workers occasionally die.
|
||||
|
||||
There aren't any message timeouts, Rabbit will redispatch the
|
||||
message only when the worker connection dies. It's fine if processing
|
||||
a message takes even very very long time.
|
||||
There aren't any message timeouts; Rabbit will redeliver the
|
||||
message only when the worker connection dies. It's fine even if processing
|
||||
a message takes a very very long time.
|
||||
|
||||
|
||||
Message acknowledgments are turned on by default. Though, in previous
|
||||
@ -169,7 +168,7 @@ once we're done with a task.
|
||||
|
||||
Using that code we may be sure that even if you kill a worker using
|
||||
CTRL+C while it was processing a message, nothing will be lost. Soon
|
||||
after the worker dies all unacknowledged messages will be redispatched.
|
||||
after the worker dies all unacknowledged messages will be redelivered.
|
||||
|
||||
> #### Forgotten acknowledgment
|
||||
>
|
||||
@ -179,8 +178,8 @@ after the worker dies all unacknowledged messages will be redispatched.
|
||||
> Rabbit will eat more and more memory as it won't be able to release
|
||||
> any unacked messages.
|
||||
>
|
||||
> In order to debug this kind of mistakes you may use `rabbitmqctl`
|
||||
> to print `messages_unacknowledged` field:
|
||||
> In order to debug this kind of mistake you can use `rabbitmqctl`
|
||||
> to print the `messages_unacknowledged` field:
|
||||
>
|
||||
> $ sudo rabbitmqctl list_queues name messages_ready messages_unacknowledged
|
||||
> Listing queues ...
|
||||
@ -198,7 +197,7 @@ When RabbitMQ quits or crashes it will forget the queues and messages
|
||||
unless you tell it not to. Two things are required to make sure that
|
||||
messages aren't lost: we need to mark both a queue and messages as durable.
|
||||
|
||||
First, we need to make sure that Rabbit will never loose our `test`
|
||||
First, we need to make sure that Rabbit will never lose our `test`
|
||||
queue. In order to do so, we need to declare it as _durable_:
|
||||
|
||||
channel.queue_declare(queue='test', durable=True)
|
||||
@ -248,9 +247,9 @@ Rabbit doesn't know anything about that and will still dispatch
|
||||
messages evenly.
|
||||
|
||||
That happens because Rabbit dispatches message just when a message
|
||||
enters the queue. It doesn't look in number of unacknowledged messages
|
||||
enters the queue. It doesn't look at the number of unacknowledged messages
|
||||
for a consumer. It just blindly dispatches every n-th message to a
|
||||
every consumer.
|
||||
the n-th consumer.
|
||||
|
||||
{% dot -Gsize="10,1.3" -Grankdir=LR %}
|
||||
digraph G {
|
||||
@ -273,7 +272,7 @@ In order to defeat that we may use `basic.qos` method with the
|
||||
`prefetch_count=1` settings. That allows us to tell Rabbit not to give
|
||||
more than one message to a worker at a time. Or, in other words, don't
|
||||
dispatch a new message to a worker until it has processed and
|
||||
acknowledged previous one.
|
||||
acknowledged the previous one.
|
||||
|
||||
{% highlight python %}
|
||||
channel.basic_qos(prefetch_count=1)
|
||||
|
Loading…
Reference in New Issue
Block a user