Get DisplayLink to work on Lenovo Y700 after upgrade to Debian Buster

Hardware

Laptop:
  • Lenovo Ideapad Y700
Graphics cards (in my system, they are configured to work in hybrid mode):
  • GeForce GTX 960M – NVIDIA Corporation GM107M (rev ff)
  • Intel Corporation HD Graphics 530 (rev 06)
Docking station with DisplayLink support:
  • ThinkPad Basic USB 3.0 Dock, Model No. DL3700-ESS
    It is connected to the laptop via an USB3 port and has its own power supply.

Using the NVIDIA graphics card

The following point is probably irrelevant to DisplayLink usage and problems. However it is part of my environment and I mention it for completeness sake.

The laptop is configured, as instructed by https://wiki.debian.org/Bumblebee, to work with the Intel graphics card. The NVIDIA card is used by applications running under optirun. I had to modify /etc/bumblebee/bumblebee.conf to use KernelDriver=nvidia-current rather than KernelDriver=nvidia.

Connecting two additional displays to the laptop

To install the appropriate driver:

  • git clone https://github.com/AdnanHodzic/displaylink-debian.git
  • Follow the instructions in README.md

When everything works properly, three displays are identified by xrandr -q | egrep axis as follows:

  • eDP-1 – laptop’s display
  • HDMI-1 – external display connected via laptop’s HDMI port
  • DVI-I-1-1 – external display connected via DisplayLink on docking port

Note that those displays could have different identifiers (such as DP1 or eDP1) in your system.

After starting the X-Window, configure the displays using:

  1. xrandr –output HDMI-1 –primary
  2. xrandr –output eDP-1 –mode 1360×768 –right-of HDMI-1
  3. sleep 1     # without it, the following display was not properly configured.
  4. xrandr –output DVI-I-1-1 –left-of HDMI-1

You probably want to add those commands to your ~/.xinitrc.
I chose the 1360×768 mode to have the same DPI in all attached displays.

Problems when upgrading from Debian 9 (Stretch) to Debian 10 (Buster)

The above setup worked under Debian 9 (Stretch).
However, after upgrade to Debian 10 (Buster) following the instructions in Release Notes for Debian 10 (buster), 64-bit PC, chapter 4, either the X-Window server did not work or the display connected via the docking station exhibited misconfiguration.

I got it to work as follows:

  1. git pull
  2. If the latest commit did not work for you, try:
    git checkout fcb6ce5bc36c774af2d7f792842bcd2ede9c7483
    as this commit worked for me after performing the following steps.
  3. Reinstall the driver by running displaylink-debian.sh and following the instructions in README.md.
  4. Finally, replace the contents of the file /etc/X11/xorg.conf.d/20-displaylink.conf, installed by the above instructions by the following:
    Section "ServerLayout"
        Identifier "layout"
        Screen 0 "Intel Graphics"
        Inactive "nvidia"
    EndSection
    
    Section "Device"
        Identifier "intel"
        Driver "modesetting"
        Option "PageFlip" "false"
        Option "AccelMethod" "None"
    EndSection
    
    Section "Screen"
        Identifier "intel"
        Device "intel"
    EndSection
    
    Section "Device"
        Identifier "nvidia"
        Driver "nvidia"
        Option "ConstrainCursor" "off"
    EndSection
    
    Section "Screen"
        Identifier "nvidia"
        Device "nvidia"
        Option "AllowEmptyInitialConfiguration" "on"
        Option "IgnoreDisplayDevices" "CRT"
    EndSection
    
    Section "Device"
        Identifier "Intel Graphics"
        Driver "modesetting"
        Option "VSync" "false"
    EndSection
    
    Section "Screen"
        Identifier "Intel Graphics"
        Device "Intel Graphics"
    EndSection
  5. You need to restart the X-Server (I restarted the entire laptop to be on the safe side).

See GitHub issue: AdnanHodzic/displaylink-debian, Debian buster #308 for a similar bug report.

Credits

I wish to thank Boris Shtrasman for reviewing a draft of this post and providing a lot of feedback. Of course, any remaining inaccuracies in this post are my sole responsibility.

I feel psychologically unsafe when working in big corporations

Three times during my career, I worked in big corporations.

1. Intel – Haifa, Israel

First time, I worked in the Intel design center in Haifa, Israel. At the time, unlike today, the operations in Haifa were small.

I left work to pursue my M.Sc. after five and half years, during which time the operations in Haifa grew to employ hundreds of people.

With hindsight, it turned out that there was also a manager who wanted me out of Intel due to his own reasons.

2. SanDisk – Kfar Sava, Israel

Second time, I worked in SanDisk, Kfar Sava, Israel.

I noticed that I feel anxious all the time while I was working there. I left the job after half a year.

Among other things, I got into a serious disagreement with a manager in another unit about a problem, whose solution was critical to the success of an assignment that I got.

Reflections

Before accepting the job offer from Google Ireland (see below), I reviewed my experiences in Intel and SanDisk and made a list of recommendations how to improve my chances to be successful in Google.

One of the recommendations was to identify a high ranking manager, who is interested at helping smart deaf people succeed in their jobs in Hi-Tech companies, and who can advocate for me in case of misunderstandings among me and managers in remote units.

3. Google – Dublin, Ireland

Third time, I worked in Google Ireland. No high ranking manager was available to advocate for me as needed. I again was anxious all the time. I left the job after three and half months.

I chose to leave the job in lieu of accepting a demand that I apologize for a harsh but non-personal expression, which I said during a discussion about an accessibility problem in an American bank, which worked with Google.

I knew, without using the term psychological safety, that if I apologize I would not be able to feel psychologically safe if I ever have to point out problems with proposed plans or designs.

Background anxiety

The unending anxiety that I felt while working in SanDisk and Google was about fear of offending managers in remote units, whom I did not know personally, but with whom I had to interact to fulfill my work duties. I could not be confident that I would have the support of my own bosses if there is any problem with remote managers.

Now there is a research pointing out what I was missing during my work in SanDisk and Google. Ironically, the research was performed in Google about a year after I left the company.

High-Performing Teams Need Psychological Safety. Here’s How to Create It

The five keys to a successful Google team

An earlier version of this article was published in LinkedIn as: Psychological Safety – the reason why I did not survive in big corporations

Anonymizing datasets for machine learning

Preface

All of us are familiar with the idea of anonymizing datasets to get rid of personally-identifiable information, in order to enable data mining while preserving (as much as possible) the privacy of the people whose data was collected. The basic idea is to modify names, ID numbers (Social Security Number in USA), home addresses, birthdays, IP addresses and similar information. Sometimes, one needs to get rid also of information about age/gender/nationality/ethnicity.

This method was subjected to a lot of research and it is easy to find, with the help of search engines, relevant papers and articles. See Bibliography for examples.

However, there is also another transformation of datasets. Unlike anonymization, as described above, this transformation is not about privacy preservation. It is about hiding the nature of the data being processed. Lacking a better term, we’ll use the term anonymization also for this transformation.

One possible application for this kind of anonymization is when one develops a revolutionary model for predicting the future behavior of the stock exchanges of the world by following various economic indicators and other publicly available time-dependent data sources.

In such an endeavor, the developer typically has gathered a lot of data, and wants to use it to train his revolutionary machine learning model. Since he cannot afford to build his own data center, he rents a lot of computing power in one of the cloud providers.

However, he does not want to take the risk of an unscrupulous employee of the cloud provider stealing his secret data or model and using it for his own benefit. He also wants to reduce the damage if a black hat hacker breaks into his rented computers.

Some users might want to process information, which interests governments such as the Chinese government. Those governments have the resources to break into cloud computers.

The classical way to mitigate such risks is to encrypt/recode/scramble (henceforth, I’ll refer to all those operations as encryption) the data being uploaded to the cloud. However, this encryption must be done in such a way that the data is still suitable for training the model. In addition, when running the model for making a prediction, the raw model’s results need to be generated in an encrypted form, for decryption in the developer’s on-premises computer/s (to which I will refer as a workstation henceforth). From this point on, we’ll use the terms anonymization and encryption interchangeably.

When looking for relevant research on the second kind of anonymization, I did not easily find relevant information. It motivated me to write this article.

Glossary

The following symbols are described in order of their appearance in text.

    • M: the transfer function of a machine learning system.
    • A: the argument of M – the data used by a machine learning system to make a prediction.
    • a_j: the j^{th} element of A.
    • P: the value of M(A) i.e. the prediction that the machine learning system makes when presented with data A.
    • p_k: the k^{th} element of P.
    • I: the identity function. For all x, I(x) = x.
    • F^{-1}(x) is the inverse of F(x), for any function F(x): for all relevant x, F^{-1}(F(x)) \equiv x \equiv F(F^{-1}(x)).
    • Functional composition: for all relevant x, (F_1 \circ F_2)(x) \equiv F_1(F_2(x)). For example, F^{-1} \circ F \equiv I \equiv F \circ F^{-1}.
    • E_a(A): a function which encrypts the argument A. Its inverse is denoted by E^{-1}_a(A'), which decrypts A', an encrypted version of the argument A.
    • D_p(P'): a function which decrypts the encrypted prediction P'. Its inverse is denoted by D^{-1}_p(P), which encrypts the prediction P.

Architecture of machine learning systems

A machine learning system is used to approximate a function M, which makes a prediction (or classification or whatever) P, given the n-tuple A which packs together several argument values:

\displaystyle{}P = M(A)

where:

\displaystyle{}A = (a_1, a_2, \ldots, a_m)

is the argument, and

\displaystyle{}P = (p_1, p_2, \ldots, p_n)

is the prediction.

The values a_j of the argument and p_k of the prediction can be of any data type and they are not limited to scalars. This is why a n-tuple notation is used rather than a vector notation.

Examples of machine learning system applications:

  • Picture classification. When presented with a picture of an animal, the system would tell how likely is the animal to be a dog, a cat or a horse. The system is trained by presenting it several pictures together with a label identifying the animal shown in the picture.
  • Prediction of the next few values of a time series, such as the numbers which describe the weather at a particular location. The system is trained by using relevant historical information.

Machine learning systems are sometimes implemented using neural networks. Neural networks have the property that a sufficiently large neural network can be trained to approximate any function, which meets certain reasonable conditions.

A machine learning system is trained to implement a good approximation of the function M by processing several 2-tuples of (A_i, P_i), which associate each prediction – the desired value of the function (which is usually a n-tuple) – with the corresponding argument value (which is usually a n-tuple).

The training process is very computationally intensive, so people often resort to cloud computing facilities, as said above.

Architecture of anonymized machine learning systems

When an user does not want to let the cloud provider know what he is doing, one possible approach is to train the model using encrypted data streams, so that the model’s outputs are encrypted as well. The data streams are encrypted on the user’s workstation. The workstation is used also to decrypt the model’s predictions.

The whole system can be described using the following formulae.

Original system:

\displaystyle{}P = M(A)

We add identity functions before and after M:

\displaystyle{}P = I \circ M \circ I(A) = I(M(I(A)))

The next step is to decompose the identity functions into pairs of a function and its inverse. The functions being used perform encryption and decryption.

\displaystyle{}P = (D_p \circ D_p^{-1}) \circ M \circ (E_a^{-1} \circ E_a(A))

where E_a(A) encrypts the argument A and D_p(P') decrypts the prediction P'.

Now we rearrange parentheses as follows:

\displaystyle{}P = D_p \circ (D_p^{-1} \circ M \circ E_a^{-1}) \circ E_a(A)

Now the system can be decomposed into three parts, which perform the following operations:

  1. Encrypt the argument A: \displaystyle{}A' = E_a(A)
  2. Actual encrypted machine learning system: \displaystyle{}P' = D_p^{-1} \circ M \circ E_a^{-1}(A') = M'(A')
  3. Decrypt the encrypted prediction P': \displaystyle{}P = D_p(P')

where A' and P' are the encrypted argument and prediction respectively.

The functions E_a(A) and D_p(P') need to be invertible, as their inverses are part of the function being approximated by the learning machine model M', which is the second part of the system, and is the one actually run on the cloud provider’s computers.

The first and third parts are implemented on the user’s workstation. The typical implementation relies upon keys and scrambling formulae.

Two more requirements are:

  • The machine learning model P' = M'(A') is to be implemented using a technology, which is sufficiently sophisticated to embed also nonlinear and invertible functions in the loss function used to evaluate it.
  • There is sufficient training and validation data to train the model, which embeds including nonlinear invertible functions.

Types of data

When dealing with anonymization of data, one has to consider separately each of the following data types.

  • Variable names
  • Numerical variables
  • Ordinal variables
  • Categorical variables
  • Time based variables

Variable names

Variable names are used for naming the various variables which are part of the argument and prediction of the machine learning model. They are used for inspecting the argument’s data streams and for retrieving relevant parts of the model’s prediction.

Of course, the cloud provider should not be exposed to the true names of the variables.

Variable names can be converted into meaningless strings. For example, by using standard password scrambling algorithms, such as salt+md5sum.

The user’s workstation would have tables for mapping among the true variable names and the names used by the model and databases in the cloud.

Numerical variables

Numerical variables can be transformed using invertible functions.

Also, if the argument A has several numerical elements (including time based elements), one could treat them as a single vector and transform it using an invertible matrix.

Mathematically, it could look as follows:

\vec {A_v'} = E_{av}(\vec {A_v}) = E_{matrix} \vec {A_v}

where:

  • \vec {A_v} is the restriction of A to numerical variables.
  • \vec {A_v'} is the encrypted version of\vec {A_v}.
  • E_{av} is the argument’s encryption function, restricted to numerical elements of the argument A.
  • E_{matrix} is an invertible transformation matrix.

Invertible scalar functions could be applied to A_v‘s elements before and after the matrix transformation.

If the argument has also an element, which is a categorical variable, one could use a different transformation for each value of the categorical variable.

Ordinal variables

The values of the ordinal variables could be permuted. The learning model will implicitly embed the inverse permutation.

Categorical variables

Shuffling categories is not enough, because categories could be identified by their frequencies (like application of Zipf’s law to decrypting substitution ciphers).

The following approach is probably not universally applicable.

Categories could be anonymized by splitting a frequently occurring category into several subcategories. The learning model will give a different prediction for each subcategory. The different predictions will have to be somehow combined in the user’s workstation.

This approach also requires the model to be formulated in such a way that the final prediction can be derived by combining the predictions corresponding to the subcategories of split categories.

Time based variables

When anonymizing time based variables, one needs to transform the argument to hide any dependence it has upon weekly, monthly, seasonal or yearly cycles. One needs also to hide dependencies upon well-known events, such as volcano eruptions or rising CO_2 concentration in air.

Otherwise, it would be possible to identify dates by looking for correlations with well-known timings.

One possible way to hide those dependencies is to apply an ARIMA forecasting model to the argument.

Bibliography

The following articles are about getting rid of personally-identifiable information in order to preserve privacy.

      1. https://en.wikipedia.org/wiki/Data_anonymization
        • Generalization.
        • Perturbation.
      2. http://blog.datasift.com/2015/04/09/techniques-to-anonymize-human-data/
        The methods proposed by this article could interfere with machine learning, except for sufficiently small perturbations.
      3. https://www.elastic.co/blog/anonymize-it-the-general-purpose-tool-for-data-privacy-used-by-the-elastic-machine-learning-team
        • Suppression of fields.
        • Generation of semantically valid artificial data (such as strings). There is a Python module – Faker – which is good for faking names, addresses and random (lorem ipsum) text.
        • The methods, mentioned in this article, cannot anonymize numeric data.
      4. https://docs.splunk.com/Documentation/Splunk/7.2.3/Troubleshooting/AnonymizedatasamplestosendtoSupport
        Anonymization of data such as usernames, IP addresses, domain names.
      5. https://www.oreilly.com/ideas/anonymize-data-limits
        Human data cannot really be anonymized.
      6. https://www.intel.co.kr/content/dam/www/public/us/en/documents/best-practices/enhancing-cloud-security-using-data-anonymization.pdf
        Several methods for anonymizing data such as identifying information of humans, IP addresses, etc:

        • Hiding
        • Hashing
        • Permutation
        • Shift
        • Enumeration
        • Truncation
        • Prefix-preserving
      7. https://ieeexplore.ieee.org/abstract/document/6470603
        Usage of MapReduce to anonymize data.

      Addendum

      After finishing the first draft of this post, I was informed of the following.

      Credits

      I wish to thank Boris Shtrasman for reviewing a draft of this post and providing a lot of feedback. Of course, any remaining inaccuracies in this post are my sole responsibility.

Python discovers its inner PHP and JavaScript personae

Did you recently switch from PHP or JavaScript to Python, and are missing the fun of being bitten by your programming language?

The collection of surprising Python snippets and lesser-known features is your ultimate guide for provoking Python to bite you in the arse.

Security and Obscurity

If you do not know the password but know how to use the password to gain access to something that was secured using this password, then this is security by obscurity.

On the other hand, if you know the password but do not know how to use the password, then this is obscurity by security.

(Sources of inspiration: The Butterfly DreamCategory Theory’s reversal of rows.)

Do material design icons fail to show in your Cordova/Vue/Vuetify/Android application?

When developing an Android application using Cordova 8.0.0 with Vue and Vuetify, I noticed that the beautiful material design icons do not show as expected.

There were two problems.

Problem 1: ligature transformations in Android’s WebView

Ligature transformations did not work in Android’s WebView, so instead of icons one sees text strings.

To fix it, I had to substitute '' for 'keyboard_arrow_up' and similar substitutions for other icons. This list of icon names and the corresponding codepoints can help.

Problem 2: missing glyph indicators

After getting rid of the text strings, instead of beautiful icons I got missing glyph indicators.

Turns out that the css file generated by the build process (npm run build followed by cordova build android) expects the font files containing the icon glyphs to be found in android_asset/www/dist/static/css/static/fonts, but they were actually located in android_asset/www/dist/static/fonts.

The fix is to use a Cordova hook script to transform the relevant links in the css file.

There are two possible locations for the hook script.

  1. Drop it into the ./hooks/before_build/ subdirectory.
  2. Specify its location in ./config.xml.

I am not very experienced with the platform, so I chose the first location.
Due to a similar reason (inexperience with node scripts), I wrote it quickly as a /bin/bash shell script.
Here it is in its full glory:

#/bin/bash
echo ============================================
echo Fix paths to font files in Android css files.
echo ============================================
CSSFILES=$1/www/dist/static/css/*.css
echo The relevant css files are: ${CSSFILES}
mkdir -p /tmp/fixcssfiles
rm -v /tmp/fixcssfiles/*
for cssfname in ${CSSFILES} ; do
  #cp ${cssfname} /tmp/fixcssfiles/`basename ${cssfname}`
  #                            # (Save a backup css file)
  sed 's=url(static/fonts/=url(../fonts/=g' < ${cssfname} \
                             > /tmp/fixcssfiles/tmpcssfname
  mv /tmp/fixcssfiles/tmpcssfname ${cssfname}
  echo Procssed ${cssfname}
done
echo ============================================

After adding the script, I got to see the material design icons when running the Android application.

Android unit testing and Mazer Rackham

כבר אמר מייזר רקהאם (“המשחק של אנדר”) שאין מורה כמו האוייב.
נזכרתי בזה במהלך המלחמה שלי בבניית בדיקות יחידה לאפליקציה לאנדרואיד בסביבת הבדיקה של API 24 והלאה.
Mazer Rackham (“Ender’s Game”) said: There is no teacher but the enemy.
I was reminded of this during my war of building unit tests for an Android application in the testing environment of API 24 and later.

I got the ability to work with Heroku using my Debian Stretch system

The other day I found that:

  1. Heroku needs Python 3.6 or later to work (as of June 22, 2018). See: Getting Started on Heroku with Python.
  2. Debian Stretch (Debian Stable as of June 22, 2018) and its backports have only Python 3.5.

The solution was to build a Docker image based upon Ubuntu 18.04, which does have Python 3.6. See the project https://gitlab.com/TDDPirate/heroku_on_debian in GitLab.

July 15, 2018 update:

After I complained about flakiness of Selenium-based tests when the Selenium server is running outside of the Docker container while the application runs inside the container, Udi Oron suggested another way to run Python 3.6 on a Debian Stretch system: use pyenv.

Turns out that pyenv solves the pain point of running Python 3.6 on Debian Stretch without having to use a container. So Selenium-based tests are now stable.

The following is an excellent article about using pyenv:
Pyenv – Python Version Management Made Easier

And the following is a link to the GitHub repository:
https://github.com/pyenv/pyenv

I suspect that pyenv is the reason why people are not in a hurry to backport new Python versions to Debian.

How to visually compare two PDF files? (cont’d)

When I asked the above question in a Telegram group, people proposed also other tools, which I am summarizing below.
Amiad Bareli, Amit Aronovitch, Meir Gil and Yehuda Deutsch – thanks.

  1. ImageMagick compare
  2. matplotlib testing framework – supports also PDF:
    >>> import matplotlib.testing.compare
    >>> matplotlib.testing.compare.comparable_formats()
    ['png', 'eps', 'svg', 'pdf']
  3.  pHash – The open source perceptual hash library.

How to visually compare two PDF files?

I have an application written in Python, which uses the ReportLab package for exporting PDF files.

Of course, the application needs to be tested. Among other tests, the PDF export function needs to be tested to ensure that the visual rendering of PDF files did not unexpectedly change.

Since it is possible to create and save an expected-results PDF file using fabricated test data, the above implies the need to compare two PDF files. It turns out that two PDF files created from the same data at two different dates – are different, due to embedded timestamps.

Hence, the need to compare visual renderings of the PDF files. ImageMagick’s convert knows how to convert PDF files into PNG. However, one needs to set the background and remove the alpha channel.

convert knows also to perform bitwise XOR on two image files, but it must be told how to compute the bitwise XOR. This is documented in StackOverflow: Searching for a way to do Bitwise XOR on images.

The script in  https://gitlab.com/TDDPirate/compare_pdfs implements all the above.