<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.7.3">Jekyll</generator><link href="https://mihevc.org/feed.xml" rel="self" type="application/atom+xml" /><link href="https://mihevc.org/" rel="alternate" type="text/html" /><updated>2018-08-14T19:23:24+00:00</updated><id>https://mihevc.org/</id><title type="html">Rokblog</title><subtitle>wish = ∇ happiness
</subtitle><author><name>Rok Mihevc</name></author><entry><title type="html">JupyterHub for teamwork on AWS</title><link href="https://mihevc.org/2017/10/26/jupyterhub-for-teamwork-on-aws.html" rel="alternate" type="text/html" title="JupyterHub for teamwork on AWS" /><published>2017-10-26T00:00:00+00:00</published><updated>2017-10-26T00:00:00+00:00</updated><id>https://mihevc.org/2017/10/26/jupyterhub-for-teamwork-on-aws</id><content type="html" xml:base="https://mihevc.org/2017/10/26/jupyterhub-for-teamwork-on-aws.html">&lt;p&gt;&lt;a href=&quot;http://knowyourmeme.com/memes/tldr&quot;&gt;TL;DR&lt;/a&gt; - Here’s a fancy one command script to build your own JupyterHub in the (AWS) cloud: &lt;a href=&quot;https://gist.github.com/rok/909b6bb57b856ac1e5e8f1f123286e92&quot;&gt;(click)&lt;/a&gt;&lt;/p&gt;

&lt;h1 id=&quot;the-need&quot;&gt;The need&lt;/h1&gt;

&lt;p&gt;I occasionally need a reliable, secure and sharable jupyter(hub) installation. The combination of the three requires a bit of thought and sysadmin maintenance later on. My wish list is usually: https connection, authentication for multiple users and maintainable Python virtual environments.&lt;/p&gt;

&lt;h1 id=&quot;the-implementation&quot;&gt;The implementation&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;As I do this just enough to forget what I did the last time I wanted the whole process scripted. I chose Cloudformation due to the fact deployments can easily be deleted with another command.&lt;/li&gt;
  &lt;li&gt;The script implements a JupyterHub installation on EC2 machine. Supervisord restarts JupyterHub on machine restart.&lt;/li&gt;
  &lt;li&gt;TODO: kernels other than python3 in separate virtualenvs.&lt;/li&gt;
  &lt;li&gt;TODO: easy user management.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;the-how-to&quot;&gt;The How-To&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;Pre-requirement: have &lt;a href=&quot;https://aws.amazon.com/cli/&quot;&gt;AWS CLI&lt;/a&gt; installed and configured on your system.&lt;/li&gt;
  &lt;li&gt;Download the &lt;a href=&quot;https://gist.github.com/rok/909b6bb57b856ac1e5e8f1f123286e92&quot;&gt;script&lt;/a&gt; and store it to a file, e.g.: &lt;strong&gt;&lt;em&gt;jh.yml&lt;/em&gt;&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;Run the following to create the EC2 machine (takes a couple of minutes, even when the machine is up it is not necessarily ready yet):
    &lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;aws cloudformation create-stack \
--template-body file://jh.yml \
--profile &amp;lt;YOUR_USER_PROFILE&amp;gt; \
--stack-name jupyterhub \
--parameters ParameterKey=KeyName,ParameterValue=&amp;lt;YOUR_KEYPAIR_NAME&amp;gt; \
             ParameterKey=InstanceType,ParameterValue=t2.micro
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;Log into your EC2 machine &lt;code class=&quot;highlighter-rouge&quot;&gt;ssh ubuntu@&amp;lt;YOUR_EC2_MACHINE_IP&amp;gt;&lt;/code&gt; and create password for ubuntu (&lt;code class=&quot;highlighter-rouge&quot;&gt;sudo passwd ubuntu&lt;/code&gt;). Create more users if you want them (e.g. sudo adduser bob).&lt;/li&gt;
  &lt;li&gt;To access the machine open &lt;code class=&quot;highlighter-rouge&quot;&gt;https://&amp;lt;YOUR_EC2_MACHINE_IP&amp;gt;&lt;/code&gt; and use one of the users credentials (ubuntu will be the admin user).&lt;/li&gt;
  &lt;li&gt;Run the following to destroy the EC2 machine:
    &lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;aws cloudformation delete-stack --profile rok --stack-name jupyterhub
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;the-code&quot;&gt;The code&lt;/h1&gt;
&lt;script src=&quot;https://gist.github.com/909b6bb57b856ac1e5e8f1f123286e92.js&quot;&gt; &lt;/script&gt;</content><author><name>Rok Mihevc</name></author><category term="Python" /><category term="jupyter" /><category term="jupyterhub" /><category term="AWS" /><category term="cloudformation" /><summary type="html">TL;DR - Here’s a fancy one command script to build your own JupyterHub in the (AWS) cloud: (click)</summary></entry><entry><title type="html">Parsing IATA slotfiles</title><link href="https://mihevc.org/2017/05/01/parsing-iata-slotfiles.html" rel="alternate" type="text/html" title="Parsing IATA slotfiles" /><published>2017-05-01T00:00:00+00:00</published><updated>2017-05-01T00:00:00+00:00</updated><id>https://mihevc.org/2017/05/01/parsing-iata-slotfiles</id><content type="html" xml:base="https://mihevc.org/2017/05/01/parsing-iata-slotfiles.html">&lt;p&gt;TL;DR - Working at Amsterdam Airport I participated in couple of projects that required parsing of slotfiles. This resulted in me wrting a parser for a very old file format. I’ve published the parser on pypi \o/. &lt;a href=&quot;https://pypi.python.org/pypi/ssim/&quot;&gt;Check it out on pypi&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;about-slotfiles&quot;&gt;About slotfiles&lt;/h1&gt;

&lt;p&gt;Slotfiles are the good old flight industry standard for passing slot information. A slot gives an airline the right to realize a flight (or series of) with a specified plane from one airport to another. Typical slotfile might look like this:&lt;/p&gt;
&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SIR
/
S17
31MAR
AMS
HHV241 01APR07JUL 0200000 18973H RMFRMF0000 C2
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The first five rows give metadata about the file.
In this case it is a Schedule Information Request (SIR), meaning it was issued by slot authorithy to another party. This slotfile describes flights of summer season of 2017 (S17), it was generated on March 31st (31MAR) and it describes flights coming or going from Amsterdam airport (AMS). The last row of this slotfile contains a single slot.&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;HHV241 01APR07JUL 0200000 18973H RMFRMF0000 C2
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Reading the slot would go like this:&lt;/p&gt;
&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;H - Clearance code (H = Holding)
HV241 - Flight number
01APR07JUL - Period of operation (from 1st April to 7th July 2017)
0200000 - Days of operation (0 flight does not operate, 
          1-7 flight operates on given weekday, e.g. 2 stands for Tuesday)
189 - Number of seats on the aircraft
73H - Aircraft code
RMF - Next station
RMF - Final destination
0000 - Time of flight (UTC)
C - Service type (C is chartered passenger flight)
2 - Frequency rate (2 means every two weeks)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The reason the format is so condensed probably has to do with the fact &lt;a href=&quot;https://en.wikipedia.org/wiki/Airline_teletype_system&quot;&gt;it was invented in the 20s&lt;/a&gt; and was designed to send slots via &lt;a href=&quot;https://en.wikipedia.org/wiki/Teletype_Corporation&quot;&gt;teletypewriters&lt;/a&gt;.
Even though it is a de-facto standard of the industry there is no open source implementation of it.
To Learn more about slotfiles see: &lt;a href=&quot;http://www.slots-austria.com/jart/prj3/sca/uploads/data-uploads/downloads/e)%20Miscellanious/overall/SCR,%20SIR%20Quick%20Guide.pdf&quot;&gt;[1]&lt;/a&gt;, &lt;a href=&quot;https://www.slotcoordination.es/csee/Satellite?blobcol=urldata&amp;amp;blobheader=application%2Fpdf&amp;amp;blobkey=id&amp;amp;blobtable=MungoBlobs&amp;amp;blobwhere=1445426450338&amp;amp;ssbinary=true&quot;&gt;[2]&lt;/a&gt;, &lt;a href=&quot;http://www.slotcoordination.ch/documents/SCR_Crash_Course_Edition_2015.pdf&quot;&gt;[3]&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;experience-with-slotfiles&quot;&gt;Experience with slotfiles&lt;/h1&gt;

&lt;p&gt;Since predicting future passenger flows is of great interest for Schiphol Airport and future schedules are passed around as slotfiles I needed to parse various slotfiles a couple of times. Original approach was using Pandas and Python’s regex. It was fun, sometimes painful and always a bit uncertain. Things worked for the summer season, but will they for the winter season? The original design had room for improvement and there were no tests! Using it was never really pleasant and more features were required. I kind of solved my problem, but my colleagues and the rest of the industry still had it. As we continued working on other projects I had no time and gave up on the original effort.&lt;/p&gt;

&lt;p&gt;Discussing with colleagues from another department gave motivation to take all the examples of slotfiles I could find online and write a well tested, reliable, open source parser. The process proved realatively straightforward and I released a new parser under &lt;a href=&quot;https://www.gnu.org/licenses/gpl-3.0.html&quot;&gt;GPLv3&lt;/a&gt; on &lt;a href=&quot;https://pypi.python.org/pypi/ssim/&quot;&gt;pypi.python.org/pypi/ssim/&lt;/a&gt; and &lt;a href=&quot;https://github.com/rok/ssim&quot;&gt;github&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;After initially publishing the parser I recieved great help from colleagues (thanks Ramon and Kevin) to find and remove quite some bugs. Some features they requested were added, Ramon proposed an approach that gave a considerable performance increase. At the moment three colleagues at Schiphol are using it in their daily work. There is a plan in the works to re-release the parser under Schiphol’s banner to better promote it in the industry.&lt;/p&gt;

&lt;h1 id=&quot;about-tests&quot;&gt;About tests&lt;/h1&gt;

&lt;p&gt;Parsing and expanding of slots needs to return reliable results and lots of attention was given to good test coverage.
The parser comes with sample slotfiles from various slot authorities and flights we expect to get from slotfiles after parsing and expanding.
More tests can be added to ensure reliability and thurstworthiness of the parser.
New variants of slotfiles can also be added if requested.&lt;/p&gt;

&lt;h1 id=&quot;try-it-out&quot;&gt;Try it out!&lt;/h1&gt;

&lt;p&gt;If you are working with slotfiles yourself you can try using the parser by pip installing it (of course you first need a working Python environment):&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;pip install ssim
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To use it in code you can run:&lt;/p&gt;
&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;slots, header, footer = ssim.read('slotfile.SIR')
flights = ssim.expand(slots, header)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;To continue analysis I would recommend Pandas:&lt;/p&gt;
&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;df = pd.DataFrame(flights)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Or if you prefer command line:&lt;/p&gt;
&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ssim -i slotfile_example.SCR -o flights.csv
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Typical performance for a large slotfile would be: reading the slotfile takes about one second and expanding slots into flights takes about four seconds.&lt;/p&gt;

&lt;p&gt;If you are missing features or found bugs please get in touch via &lt;a href=&quot;https://github.com/rok/ssim/issues&quot;&gt;github&lt;/a&gt; or submit a pull request. Any help / feedback is welcome!&lt;/p&gt;</content><author><name>Rok Mihevc</name></author><category term="Python" /><category term="Airlines" /><summary type="html">TL;DR - Working at Amsterdam Airport I participated in couple of projects that required parsing of slotfiles. This resulted in me wrting a parser for a very old file format. I’ve published the parser on pypi \o/. Check it out on pypi.</summary></entry><entry><title type="html">Fun with personal location history</title><link href="https://mihevc.org/2016/10/01/fun-with-personal-location-history.html" rel="alternate" type="text/html" title="Fun with personal location history" /><published>2016-10-01T00:00:00+00:00</published><updated>2016-10-01T00:00:00+00:00</updated><id>https://mihevc.org/2016/10/01/fun-with-personal-location-history</id><content type="html" xml:base="https://mihevc.org/2016/10/01/fun-with-personal-location-history.html">&lt;p&gt;So I wanted to know how much time I spend at work. I have Google track my every step trough android’s location history and I am sitting on a proper treasure of personal data. Also, I am an advertisment product, but that’s besides the point.
Sooo, I pulled my location data from &lt;a href=&quot;https://takeout.google.com/settings/takeout/custom/location_history&quot;&gt;location history takeout&lt;/a&gt;, started Jupyter notebook and read my location history into a pandas dataframe.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'LocationHistory.json'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'r'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fh&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;raw&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;json&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loads&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fh&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;read&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;raw&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'locations'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;assign&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;to_datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'timestampMs'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;astype&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'ms'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sort_values&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'dt'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ascending&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;assign&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lat&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'latitudeE7'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1E7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;assign&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lon&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'longitudeE7'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1E7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;assign&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dur&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'dt'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;diff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;total_seconds&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3600&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;assign&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;weekday&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'dt'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;weekday&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;assign&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dur&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'dur'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shift&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;assign&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;workday&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'weekday'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)))&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;regex&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'dt|lat$|lon$|dur|hour|weekday|workday'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Now we have a list of activities with their durations. How do we map them to work / home / other? Let’s take a look at a histogram of lattitudes and londitudes of the time after I started working at the current position and plot it in 2D.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;df_subset&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;'2015-09-01' &amp;lt; dt&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hist2d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df_subset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'lon'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df_subset&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'lat'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bins&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
               &lt;span class=&quot;n&quot;&gt;norm&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mpl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;colors&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PowerNorm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.15&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cmap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mpl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cm&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Greys&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;img src=&quot;/images/location_histogram.png&quot; alt=&quot;location_histogram&quot; title=&quot;Histogram of my location.&quot; /&gt;&lt;/p&gt;

&lt;p&gt;That’s to far away, I must have traveled to different places. Let’s just zoom into Amsterdam.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/location_histogram_2.png&quot; alt=&quot;location_histogram_2&quot; title=&quot;Zoomed in istogram of my location.&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Ok, it seems most of my activities are logged at three locations, a quick check confirms those are my work and the two places I’ve lived at. Let’s label activities that fit into the area around my work:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;work&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;52.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xxx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;4.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xxx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;lat_range&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xxx&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;lon_range&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xxx&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;select_work&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
   &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;work&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lat_range&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'lat'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'lat'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;work&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lat_range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;
   &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;work&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon_range&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'lon'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'lon'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;work&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lon_range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select_work&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'label'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'work'&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Etc. for home and other.&lt;/p&gt;

&lt;p&gt;Now we can use these labels to see how I’ve spent September 2016:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;labeled_location_times&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;'2016-09-05' &amp;lt; dt and workday == True&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;regex&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'dt|dur|label'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupby&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'label'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reset_index&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'label'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pivot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;columns&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'label'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'dur'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;labeled_location_times&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;plot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;kind&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'bar'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;img src=&quot;/images/work_home_other.png&quot; alt=&quot;work_home_other&quot; title=&quot;Work, home, and other.&quot; /&gt;&lt;/p&gt;

&lt;p&gt;I’ve spent a lot of time at work this September. Let’s see how much of overtime I usually do:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;work_hours&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;query&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;'2015-09-01' &amp;lt; dt and workday == True and label == 'work'&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupby&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;agg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'dur'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'work_hours'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'sum'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}})[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'dur'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;assign&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;over_hours&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'work_hours'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;assign&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;over_hours_rolling_mean&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; 
              &lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'over_hours'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rolling&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;center&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;ax&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;work_hours&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;like&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'over'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;plot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;img src=&quot;/images/overhours.png&quot; alt=&quot;overhours&quot; title=&quot;Overhours I do.&quot; /&gt;&lt;/p&gt;

&lt;p&gt;It seems I do a lot of overtime. I should probably get a hobby.
So what is my effective hourly rate? What part of hours I work do I get paid for?&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;work_hours&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;to_datetime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'2016-02-01'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;():]&lt;/span&gt;
           &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;assign&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eff&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;8&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'work_hours'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
           &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;I average 9.45 hours at work per day so my effective hourly rate is 85.7% of my actual hourly rate.&lt;/p&gt;

&lt;p&gt;So there you have it! If there is enough interest I will also publish the notebook.&lt;/p&gt;</content><author><name>Rok Mihevc</name></author><category term="Python" /><category term="Data science" /><summary type="html"></summary></entry><entry><title type="html">Spark resampling</title><link href="https://mihevc.org/2016/09/28/spark-resampling.html" rel="alternate" type="text/html" title="Spark resampling" /><published>2016-09-28T00:00:00+00:00</published><updated>2016-09-28T00:00:00+00:00</updated><id>https://mihevc.org/2016/09/28/spark-resampling</id><content type="html" xml:base="https://mihevc.org/2016/09/28/spark-resampling.html">&lt;p&gt;Working with time dependat data in Spark I often need to aggregate data to arbitrary time intervals. As there is no handy function for that I (with help of &lt;a href=&quot;https://github.com/equialgo&quot;&gt;equialgo&lt;/a&gt;) wrote a helper function that will resample a time series column to intervals of arbitrary length, that can then be used for aggregation operations.&lt;/p&gt;

&lt;p&gt;Let’s look at the function first:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;resample&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;column&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;agg_interval&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;900&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time_format&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'yyyy-MM-dd HH:mm:ss'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;column&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;==&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;column&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;column&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;# Convert the timestamp to unix timestamp format.&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# Unix timestamp = number of seconds since 00:00:00 UTC, 1 January 1970.&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;col_ut&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unix_timestamp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;column&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;time_format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;# Divide the time into dicrete intervals, by rounding. &lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;col_ut_agg&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;floor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col_ut&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;agg_interval&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;agg_interval&lt;/span&gt;  

    &lt;span class=&quot;c&quot;&gt;# Convert to and return a human readable timestamp&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;F&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;from_unixtime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;col_ut_agg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;To give an example of use, let’s create a sample timestamped dataframe:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sqlContext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;createDataFrame&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'dt'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'ip'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'email_provider'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;show&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;+-------------------+---------------+--------------+
|                 dt|             ip|email_provider|
+-------------------+---------------+--------------+
|2016-01-20 17:08:24|  76.60.136.211|     yahoo.com|
|2016-01-20 17:56:08| 33.243.151.184|   hotmail.com|
|2016-01-20 17:01:34|229.223.121.197|     gmail.com|
+-------------------+---------------+--------------+
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We now use the resample function to resample our data to 15 minutes intervals (or rather 900 seconds):&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;withColumn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'dt_resampled'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;resample&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;agg_interval&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;900&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;show&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;+-------------------+---------------+--------------+-------------------+
|                 dt|             ip|email_provider|       dt_resampled|
+-------------------+---------------+--------------+-------------------+
|2016-01-20 17:08:24|  76.60.136.211|     yahoo.com|2016-01-20 17:00:00|
|2016-01-20 17:56:08| 33.243.151.184|   hotmail.com|2016-01-20 17:45:00|
|2016-01-20 17:01:34|229.223.121.197|     gmail.com|2016-01-20 17:00:00|
+-------------------+---------------+--------------+-------------------+
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We now use the new ‘dt_resampled’ column to group rows by intervals and email providers, then aggregate the resulting groups by counting rows of groups.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;df_resampled&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'dt_resampled'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'email_provider'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df_resampled&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;show&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;+-------------------+--------------+-----+
|               time|email_provider|count|
+-------------------+--------------+-----+
|2016-01-20 16:30:00|   hotmail.com|   31|
|2016-01-20 16:45:00|     gmail.com|   12|
|2016-01-20 16:00:00|     yahoo.com|   39|
+-------------------+--------------+-----+
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The data was resampled and aggregated, only thing left is to plot it. We move the aggregated Dataframe to Pandas, pivot it on ‘email_provider’ column and finally plot the counts in time:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;df_resampled&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toPandas&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; \
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pivot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;index&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'dt_resampled'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;columns&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'email_provider'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'count'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; \
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;plot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;figsize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;14&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;title&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'Count emails per 15 minute interval'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;img src=&quot;/images/2016-09-27-spark-resampling_7_1.png&quot; alt=&quot;png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As shown this resampling can be easy and fast in Spark using a helper function. The presented function will work for from microsecond- to century-long intervals. The one downside would be that leap years will make time stamps over long periods look less nice and solving for that would make the proposed function much more complicated as you can imagine by observing gregorian calendar time shifting:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/2016-09-27-gregoriancalendarleap_solstice.svg&quot; alt=&quot;svg&quot; /&gt;&lt;/p&gt;</content><author><name>Rok Mihevc</name></author><category term="python" /><category term="notebook" /><summary type="html">Working with time dependat data in Spark I often need to aggregate data to arbitrary time intervals. As there is no handy function for that I (with help of equialgo) wrote a helper function that will resample a time series column to intervals of arbitrary length, that can then be used for aggregation operations.</summary></entry><entry><title type="html">Letsencrypt Ansible</title><link href="https://mihevc.org/2016/09/27/lets-encrypt-ansible.html" rel="alternate" type="text/html" title="Letsencrypt Ansible" /><published>2016-09-27T00:00:00+00:00</published><updated>2016-09-27T00:00:00+00:00</updated><id>https://mihevc.org/2016/09/27/lets-encrypt-ansible</id><content type="html" xml:base="https://mihevc.org/2016/09/27/lets-encrypt-ansible.html">&lt;p&gt;Recently I have set up a &lt;a href=&quot;http://snowplowanalytics.com/&quot;&gt;Snowplow&lt;/a&gt; collector at work and needed a reliable way of generating https certificates. The collector (think of it as a very dumb web server with great logging) sits on a subdomain of a website and listens to events sent by javascript tags triggered by visitors of the site. To have this event traffic from visitors to the collector encrypted I needed an https cetificate.
It is a relatively straitforward to create a letsencrypt certicate with the great DNS challenge client letsencrypt provides, but I wanted an automatable way to do it and on a machine I can later destroy, etc. As the whole project is hosted on AWS EC2 I decided to use Ansible to do it. The script does the following:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Spins up a nano EC2 instance.&lt;/li&gt;
  &lt;li&gt;Points a domain at it with Route 53 DNS service of AWS.&lt;/li&gt;
  &lt;li&gt;Installs letsencrypt on the instance and runs it to generate the certificate.&lt;/li&gt;
  &lt;li&gt;Copies the certificate into AWS certificate menegement service.&lt;/li&gt;
  &lt;li&gt;Destroys the nano instance and Route 53 DNS entry.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can find the code &lt;a href=&quot;https://github.com/rok/letsencrypt-ansible&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let me know if you actually end up using it.&lt;/p&gt;</content><author><name>Rok Mihevc</name></author><category term="ansible," /><category term="crypto," /><category term="automation" /><summary type="html">Recently I have set up a Snowplow collector at work and needed a reliable way of generating https certificates. The collector (think of it as a very dumb web server with great logging) sits on a subdomain of a website and listens to events sent by javascript tags triggered by visitors of the site. To have this event traffic from visitors to the collector encrypted I needed an https cetificate. It is a relatively straitforward to create a letsencrypt certicate with the great DNS challenge client letsencrypt provides, but I wanted an automatable way to do it and on a machine I can later destroy, etc. As the whole project is hosted on AWS EC2 I decided to use Ansible to do it. The script does the following:</summary></entry><entry><title type="html">Crontabed PyDrive uploader</title><link href="https://mihevc.org/2016/02/04/crontabed-pydrive-uploader.html" rel="alternate" type="text/html" title="Crontabed PyDrive uploader" /><published>2016-02-04T00:00:00+00:00</published><updated>2016-02-04T00:00:00+00:00</updated><id>https://mihevc.org/2016/02/04/crontabed-pydrive-uploader</id><content type="html" xml:base="https://mihevc.org/2016/02/04/crontabed-pydrive-uploader.html">&lt;p&gt;I am preparing small data science-y project, where I will be collecting a bunch of logs on a network connected Raspberry Pi. I want to upload the logs to an online cloud storage automatically.
As I have a lot of storage on my Google Drive and it has a great &lt;a href=&quot;https://developers.google.com/drive/&quot;&gt;API&lt;/a&gt; with a &lt;a href=&quot;http://pythonhosted.org/PyDrive/&quot;&gt;nice python wrapper&lt;/a&gt; for, so I decided to use it.
The setup is quite simple and I decided to write it up for my future reference and the intertubes.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://developers.google.com/drive/v3/web/quickstart/python#step_1_turn_on_the_api_name&quot;&gt;Enable the drive API with your google account&lt;/a&gt;. Only follow the Step 1 and download the client secret file and name it &lt;em&gt;client_secrets.json&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;To install PyDrive run:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;pip install pydrive&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;I used python 2.&lt;/p&gt;

&lt;p&gt;Copy the following code into &lt;em&gt;uploader.py&lt;/em&gt; and store it in the same location as the &lt;em&gt;client_secrets.json&lt;/em&gt;.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pydrive.auth&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GoogleAuth&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pydrive.drive&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GoogleDrive&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;os&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;argparse&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;PARENT_ID&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;LONG_FOLDER_ID_STRING&quot;&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Parse the passed arguments&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;parser&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;argparse&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ArgumentParser&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;parser&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;add_argument&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;files&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;help&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;List files to be uploaded.&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nargs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;+&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Define the credentials folder&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;home_dir&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;os&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;expanduser&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;~&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;credential_dir&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;os&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;home_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;.credentials&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;os&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exists&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;credential_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;os&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;makedirs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;credential_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;credential_path&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;os&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;credential_dir&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;pydrive-credentials.json&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Start authentication&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;gauth&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GoogleAuth&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Try to load saved client credentials&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;gauth&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;LoadCredentialsFile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;credential_path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gauth&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;credentials&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;is&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# Authenticate if they're not there&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;gauth&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CommandLineAuth&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;elif&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gauth&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;access_token_expired&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# Refresh them if expired&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;gauth&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Refresh&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# Initialize the saved creds&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;gauth&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Authorize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Save the current credentials to a file&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;gauth&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SaveCredentialsFile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;credential_path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;drive&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GoogleDrive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gauth&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Upload the files&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parser&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parse_args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;files&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;new_file&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;drive&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CreateFile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;parents&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;id&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PARENT_ID&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}],&lt;/span&gt; \ 
                                              &lt;span class=&quot;s&quot;&gt;&quot;mimeType&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;text/plain&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;new_file&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SetContentFile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;new_file&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Upload&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Pick a folder in Google Drive and store its id under the PARENT_ID variable in uploader.py. The id is the string in the URL of your folder, e.g.: &lt;em&gt;https://drive.google.com/drive/folders/LONG_FOLDER_ID_STRING&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Run&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;python uploader.py&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;and follow the initial authentication instructions. Trough the prompt you will be given an URL, similar to &lt;a href=&quot;https://accounts.google.com/o/oauth2/auth?scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&amp;amp;redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&amp;amp;response_type=code&amp;amp;client_id={ID_STRING}.apps.googleusercontent.com&amp;amp;access_type=offline&quot;&gt;this one&lt;/a&gt;. Replace the {ID_STRING} with the one you were given and open the link in a browser. You will be asked to confirm an authorisation request and receive an authorisation code. Enter the code into the prompt, your uploader is now authenticated. Delete &lt;em&gt;client_secrets.json&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;You can now upload files to the selected Google Drive folder by running:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;python uploader.py example.txt&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;To create a crontab script, run:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt; crontab &lt;span class=&quot;nt&quot;&gt;-e&lt;/span&gt; &lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
&lt;p&gt;and add the fillowing line to it:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;0 &lt;span class=&quot;k&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;*&lt;/span&gt; python ~/uploader.py ~/fr24feed.log &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; truncate &lt;span class=&quot;nt&quot;&gt;-s0&lt;/span&gt; ~/fr24feed.log&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In my case I am uploading a log of transponder signals of nearby planes. I am using &lt;a href=&quot;https://www.flightradar24.com/raspberry-pi&quot;&gt;Flightradar24&lt;/a&gt; packaged &lt;a href=&quot;https://github.com/antirez/dump1090&quot;&gt;dump1090&lt;/a&gt; to decode and record transponder signals.&lt;/p&gt;

&lt;p&gt;We now have an automated uploader to the cloud. Log uploading is only one of the many options available. Motion detection triggered videos, regular file backups and more could be uploaded like this.&lt;/p&gt;</content><author><name>Rok Mihevc</name></author><category term="raspberry, python, google drive" /><summary type="html">I am preparing small data science-y project, where I will be collecting a bunch of logs on a network connected Raspberry Pi. I want to upload the logs to an online cloud storage automatically. As I have a lot of storage on my Google Drive and it has a great API with a nice python wrapper for, so I decided to use it. The setup is quite simple and I decided to write it up for my future reference and the intertubes.</summary></entry><entry><title type="html">Raspberry Pi based wireless motion triggered camera</title><link href="https://mihevc.org/2013/04/25/raspberry-pi-based-wireless-motion-triggered-camera.html" rel="alternate" type="text/html" title="Raspberry Pi based wireless motion triggered camera" /><published>2013-04-25T00:00:00+00:00</published><updated>2013-04-25T00:00:00+00:00</updated><id>https://mihevc.org/2013/04/25/raspberry-pi-based-wireless-motion-triggered-camera</id><content type="html" xml:base="https://mihevc.org/2013/04/25/raspberry-pi-based-wireless-motion-triggered-camera.html">&lt;h3 id=&quot;the-setup&quot;&gt;The setup&lt;/h3&gt;

&lt;p&gt;To start the project I wrote a fresh image of &lt;a href=&quot;http://www.raspberrypi.org/downloads&quot;&gt;Raspbian wheezy&lt;/a&gt; onto a 4GB SD card with &lt;a href=&quot;http://sourceforge.net/projects/win32diskimager/&quot;&gt;win32diskimager&lt;/a&gt;.
Raspberry was connected to my home router and accessed trought LAN with &lt;a href=&quot;http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html&quot;&gt;putty&lt;/a&gt;.
I also used Logitech C270 webcam and Edimax EW-7811Un wireless adapter.&lt;/p&gt;

&lt;p&gt;I resized the partition size with raspi-config, using expand_rootfs command:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;raspi-config&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;To set the proper timezone run:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;dpkg-reconfigure tzdata&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;motion&quot;&gt;Motion&lt;/h3&gt;

&lt;p&gt;References: [&lt;a href=&quot;http://through-the-interface.typepad.com/through_the_interface/2012/09/creating-a-motion-detecting-security-cam-with-a-raspberry-pi-part-2.html&quot;&gt;1&lt;/a&gt;] [&lt;a href=&quot;http://jeremyblythe.blogspot.co.uk/2012/06/motion-google-drive-uploader-and.html&quot;&gt;2&lt;/a&gt;]&lt;/p&gt;

&lt;p&gt;First we update the repository list, upgrade the system and install &lt;a href=&quot;http://www.lavrsen.dk/foswiki/bin/view/Motion/WebHome&quot;&gt;motion&lt;/a&gt;, the motion detection software. This will take a while, optionaly you can skip apt-get upgrade.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;apt-get update &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;apt-get upgrade &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;apt-get install motion&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;We enable the motion daemon in /etc/default/motion by setting &lt;em&gt;start_motion_daemon=yes&lt;/em&gt; and restart the system to see if it boots up properly.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;vi /etc/default/motion
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;reboot&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;At this point you might want to test out if motion actually works. Make some motion in front of the camera and check if any images and videos were stored into /tmp/motion. If yes continue! :)&lt;/p&gt;

&lt;p&gt;First install python package manager pip and python module gdata:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;apt-get install python-pip
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;pip install gdata&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Then download &lt;a href=&quot;http://files.mihevc.org/raspi/motion/uploader&quot;&gt;uploader.py&lt;/a&gt; and &lt;a href=&quot;http://files.mihevc.org/raspi/motion/uploader.cfg&quot;&gt;uploader.cfg&lt;/a&gt;. Configure uploader.cfg with your google account credentials to enable motion to upload recorded videos.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;wget http://files.mihevc.org/raspi/motion/uploader &lt;span class=&quot;nt&quot;&gt;-O&lt;/span&gt; /etc/motion/uploader.py
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;wget http://files.mihevc.org/raspi/motion/uploader.cfg &lt;span class=&quot;nt&quot;&gt;-O&lt;/span&gt; /etc/motion/uploader.cfg
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;chmod +x /etc/motion/uploader.py
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;vi /etc/motion/uploader.cfg&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Now reopen /etc/motion/motion.conf and add the following line: &lt;em&gt;on_movie_end python /etc/motion/uploader.py /etc/motion/uploader.cfg %f&lt;/em&gt;&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;vi /etc/motion/motion.conf
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;reboot&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;We create a folder named motion on our google drive, this is where the captured videos will be uploaded to. Now we plugin the webcam, restart the system and test it by, well, moving. Final result should be recieving an email to our gmail account with a link to the captured video.
If something is not working it is worth checking /tmp/motion folder - if there is no files in it motion is not working properly. If files appear in /tmp/motion, but don’t upload to your google drive account the problem is in the uploader area.&lt;/p&gt;

&lt;h3 id=&quot;wlan&quot;&gt;WLAN&lt;/h3&gt;

&lt;p&gt;References: [&lt;a href=&quot;http://svay.com/blog/setting-up-a-wifi-connection-on-the-raspberrypi/&quot;&gt;3&lt;/a&gt;] [&lt;a href=&quot;http://www.linux-magazine.com/Online/Blogs/Productivity-Sauce/How-to-Quickly-Configure-Wireless-WPA-Connection-on-Raspberry-Pi&quot;&gt;4&lt;/a&gt;] [&lt;a href=&quot;http://rpi.tnet.com/project/scripts/wifi_check&quot;&gt;5&lt;/a&gt;]&lt;/p&gt;

&lt;p&gt;To set up the wireless connection we first open the dedicated config file:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;vi /etc/network/interfaces&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The “wlan0 part” should look something like this (but various other setups are possible).&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;auto wlan0
iface wlan0 inet dhcp
wpa-ssid YOUR_SSID
wpa-psk your_password&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In case you have Edimax EW-7811Un wireless adapter you can assure it never enters power saving mode by setting &lt;em&gt;options 8192cu rtw_power_mgnt=0 rtw_enusbss=0&lt;/em&gt; in /etc/modprobe.d/8192cu.conf:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;vi /etc/modprobe.d/8192cu.conf&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Further, to ensure reestablishing of wlan connection in case it is dropped we download a script that reconnects in case of dropped connection and add it to cron.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;wget http://files.mihevc.org/raspi/motion/WiFi_Check &lt;span class=&quot;nt&quot;&gt;-O&lt;/span&gt; /usr/local/bin/WiFi_Check
&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;chmod 0755 /usr/local/bin/WiFi_Check
crontab &lt;span class=&quot;nt&quot;&gt;-e&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Add the following line at the end&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;k&quot;&gt;*&lt;/span&gt;/5 &lt;span class=&quot;k&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;sudo&lt;/span&gt; /usr/local/bin/WiFi_Check&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Finaly we reboot to finish setting up.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;reboot&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Now we unplug the wired connecition and set up the camera and Pi in the desired position and see if everything works.
You can tweak the /etc/motion/motion.conf settings for higher resolution etc.&lt;/p&gt;</content><author><name>Rok Mihevc</name></author><category term="raspberry linux video" /><summary type="html">The setup</summary></entry><entry><title type="html">Blog fashion makeover</title><link href="https://mihevc.org/2012/07/18/blog-fashion-makeover.html" rel="alternate" type="text/html" title="Blog fashion makeover" /><published>2012-07-18T00:00:00+00:00</published><updated>2012-07-18T00:00:00+00:00</updated><id>https://mihevc.org/2012/07/18/blog-fashion-makeover</id><content type="html" xml:base="https://mihevc.org/2012/07/18/blog-fashion-makeover.html">&lt;p&gt;Proin at eros non eros adipiscing mollis. Donec semper turpis sed diam. Sed consequat ligula nec tortor. Integer eget sem. Ut vitae enim eu est vehicula gravida. Morbi ipsum ipsum, porta nec, tempor id, auctor vitae, purus. Pellentesque neque. Nulla luctus erat vitae libero. Integer nec enim. Phasellus aliquam enim et tortor. Quisque aliquet, quam elementum condimentum feugiat, tellus odio consectetuer wisi, vel nonummy sem neque in elit. Curabitur eleifend wisi iaculis ipsum. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. In non velit non ligula laoreet ultrices. Praesent ultricies facilisis nisl. Vivamus luctus elit sit amet mi. Phasellus pellentesque, erat eget elementum volutpat, dolor nisl porta neque, vitae sodales ipsum nibh in ligula. Maecenas mattis pulvinar diam. Curabitur sed leo.&lt;/p&gt;</content><author><name>Rok Mihevc</name></author><summary type="html">Proin at eros non eros adipiscing mollis. Donec semper turpis sed diam. Sed consequat ligula nec tortor. Integer eget sem. Ut vitae enim eu est vehicula gravida. Morbi ipsum ipsum, porta nec, tempor id, auctor vitae, purus. Pellentesque neque. Nulla luctus erat vitae libero. Integer nec enim. Phasellus aliquam enim et tortor. Quisque aliquet, quam elementum condimentum feugiat, tellus odio consectetuer wisi, vel nonummy sem neque in elit. Curabitur eleifend wisi iaculis ipsum. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. In non velit non ligula laoreet ultrices. Praesent ultricies facilisis nisl. Vivamus luctus elit sit amet mi. Phasellus pellentesque, erat eget elementum volutpat, dolor nisl porta neque, vitae sodales ipsum nibh in ligula. Maecenas mattis pulvinar diam. Curabitur sed leo.</summary></entry><entry><title type="html">HowTo - Debian Live NAS with USB</title><link href="https://mihevc.org/2012/05/01/howto-debian-live-nas-with-usb.html" rel="alternate" type="text/html" title="HowTo - Debian Live NAS with USB" /><published>2012-05-01T00:00:00+00:00</published><updated>2012-05-01T00:00:00+00:00</updated><id>https://mihevc.org/2012/05/01/howto---debian-live-nas-with-usb</id><content type="html" xml:base="https://mihevc.org/2012/05/01/howto-debian-live-nas-with-usb.html">&lt;p&gt;I had an old computer or two lying around, so I decided to cannabalize them and make a cheap &lt;a href=&quot;http://en.wikipedia.org/wiki/Network-attached_storage&quot;&gt;NAS&lt;/a&gt; with the old disks. The main ideas were to keep it easy to maintain and to buy no extra components.
Here’s what I did:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Hardware: I used an old desktop computer with BIOS that can boot from USB and an old 1GB USB stick (500 mb would be enough)&lt;/li&gt;
  &lt;li&gt;I downloaded &lt;a href=&quot;launchpad.net/win32-image-writer&quot;&gt;Image Writer for Windows&lt;/a&gt; and burnt &lt;a href=&quot;http://cdimage.debian.org/cdimage/release/current-live/i386/usb-hdd/&quot;&gt;Debian Live image&lt;/a&gt; (debian-live-6.0.4-i386-standard.img in my case - I only wanted the command line) to the USB.&lt;/li&gt;
  &lt;li&gt;Plugged the USB in and opened BIOS, to set it as first in the boot order.&lt;/li&gt;
  &lt;li&gt;In the boot manager press Esc and write “live persistent”.&lt;/li&gt;
  &lt;li&gt;Install some packages needed - “sudo apt-get install parted ssh samba ntfs-3g hdparm”&lt;/li&gt;
  &lt;li&gt;Partition the nonpartitioned part of the USB by: “parted /dev/sda” (where /dev/sda is the USB stick you’re using). Use “help mkpart” and “print” to help with partitioning. Make a new partition by e.g.: “mkpart primary 302MB 1011MB”.&lt;/li&gt;
  &lt;li&gt;Make a filesystem on the new partition: “mkfs -t ext2 -L live-rw /dev/sda2”&lt;/li&gt;
  &lt;li&gt;Configure the bootloader, so it boots automaticaly after 5 seconds: “sudo nano /live/image/syslinux/syslinux.cfg”&lt;/li&gt;
  &lt;li&gt;INSTRUCTIONS HERE&lt;/li&gt;
  &lt;li&gt;Make default booting option a persistent one “sudo nano /live/image/syslinux/live.cfg”&lt;/li&gt;
  &lt;li&gt;INSTRUCTIONS HERE&lt;/li&gt;
  &lt;li&gt;Add some password for minimal security: “sudo passwd root” and “sudo passwd user”&lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Find the hard disks designations (in my case all partitions were NTFS): &lt;code class=&quot;highlighter-rouge&quot;&gt;&quot;sudo fdisk -l | grep NTFS&quot;&lt;/code&gt; and add them into /etc/fstabs: “/dev/sda1 /mnt/windows ntfs-3g defaults 0 0” to have them mounted on startup.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;To have disks spin down if idle edit /etc/hdparm.conf: “/dev/sda { spindown_time = 120 }”&lt;/li&gt;
  &lt;li&gt;Configure samba, to have it share the disks within your network. See &lt;a href=&quot;http://www.debuntu.org/guest-file-sharing-with-samba&quot;&gt;this&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result should be a USB stick which boots up a nice light debian system, shares your files accross the LAN and is easy to maintain if you have basic Linux knowledge.
I might improve the power consumption my setup in the future by replacing the old computer with a &lt;a href=&quot;http://www.raspberrypi.org/&quot;&gt;Raspberry Pi&lt;/a&gt; to make the whole thing much smaller and economical (supposedly it only consumes ~5W).&lt;/p&gt;</content><author><name>Rok Mihevc</name></author><category term="NAS Linux Live Debian USB" /><summary type="html">I had an old computer or two lying around, so I decided to cannabalize them and make a cheap NAS with the old disks. The main ideas were to keep it easy to maintain and to buy no extra components. Here’s what I did:</summary></entry><entry><title type="html">Wordpress at Dreamhost hacked</title><link href="https://mihevc.org/2012/02/28/wordpress-at-dreamhost-hacked.html" rel="alternate" type="text/html" title="Wordpress at Dreamhost hacked" /><published>2012-02-28T00:00:00+00:00</published><updated>2012-02-28T00:00:00+00:00</updated><id>https://mihevc.org/2012/02/28/wordpress-at-dreamhost-hacked</id><content type="html" xml:base="https://mihevc.org/2012/02/28/wordpress-at-dreamhost-hacked.html">&lt;p&gt;Recently someone got access into admin area of my wordpress sites. It would appear that he added a small piece of code into one php file via the embedded wordpress editor. This thing ran itself via the eval function and copied into each .php file of wordpress (~400 files), then served some adds on the website via a .js file.
Fortunately solution to this is googlable and is a bash-fu script, see below:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;find &lt;span class=&quot;nb&quot;&gt;.&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-type&lt;/span&gt; d &lt;span class=&quot;nt&quot;&gt;-perm&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-o&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;w &lt;span class=&quot;nt&quot;&gt;-print&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-exec&lt;/span&gt; chmod 770 &lt;span class=&quot;o&quot;&gt;{}&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\;&lt;/span&gt;
find &lt;span class=&quot;nb&quot;&gt;.&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-wholename&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'*wp-content/uploads/*.php'&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-exec&lt;/span&gt; rm &lt;span class=&quot;nt&quot;&gt;-rf&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{}&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\;&lt;/span&gt;
find ./ &lt;span class=&quot;nt&quot;&gt;-name&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;*.php&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-type&lt;/span&gt; f |  xargs sed &lt;span class=&quot;nt&quot;&gt;-i&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'s#&amp;lt;?php /\*\*/ eval(base64_decode(&quot;aWY.*?&amp;gt;##g'&lt;/span&gt; 2&amp;gt;&amp;amp;1
find ./ &lt;span class=&quot;nt&quot;&gt;-name&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;*.php&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-type&lt;/span&gt; f |  xargs sed &lt;span class=&quot;nt&quot;&gt;-i&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'/./,$!d'&lt;/span&gt; 2&amp;gt;&amp;amp;1&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;&lt;em&gt;Sources:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;[1] &lt;a href=&quot;http://danhilltech.tumblr.com/post/18085864093/if-you-get-eval-base64-hacked-on-wordpress-dreamhost&quot;&gt;http://danhilltech.tumblr.com/post/18085864093/if-you-get-eval-base64-hacked-on-wordpress-dreamhost&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;[2] &lt;a href=&quot;http://blog.sucuri.net/2010/05/simple-cleanup-solution-for-the-latest-wordpress-hack.html&quot;&gt;http://blog.sucuri.net/2010/05/simple-cleanup-solution-for-the-latest-wordpress-hack.html&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;</content><author><name>Rok Mihevc</name></author><category term="wordpress hosting dreamhost hacked bash php" /><summary type="html">Recently someone got access into admin area of my wordpress sites. It would appear that he added a small piece of code into one php file via the embedded wordpress editor. This thing ran itself via the eval function and copied into each .php file of wordpress (~400 files), then served some adds on the website via a .js file. Fortunately solution to this is googlable and is a bash-fu script, see below:</summary></entry></feed>