Speech to Text in Hebrew

Yesterday evening I experimented with two STT (speech to text) services – Live Transcribe and WebCaptioner.

I operated both of them in a meeting whose language was Hebrew. The meeting included a lecture in a hall and remote connection via Zoom. I elected to connect via Zoom.
Live Transcribe was operated on a tablet, which evasdropped on my laptop via speakers+microphone, and WebCaptioner was operated on a browser running on my laptop. Zoom ran on my laptop as an application. Both STT services were setup to recognize Hebrew speech.

My finding was that most of the time, the services did not deliver the goods. They emitted Hebrew words without grammar and out of the meeting’s context. However, there were moments in which they worked correctly.
I also noticed that when the services did not work correctly, each of them had different output. When they worked correctly, the texts they produced were similar to each other.

During the meeting, the lecturers did not use a wireless microphone located near their mouths. They stood at different distances from the microphone. When they stood near the microphone, the services worked better than when the lecturers stood far away from the microphone.
In “worked better”, I mean that the services continuously emitted text, rather than long periods of no activity with interspersed short text segments.

The above confirms what I found long time ago – STT services need to receive the same treatment as the HOH (hard of hearing people). Just as environmental noises interfere with HOH ability to understand speech, they interfere also with STT services.

המרת דיבור לטקסט בעברית

אתמול בערב עשיתי ניסוי ב-Live Transcribe וב-WebCaptioner.

הפעלתי את שתיהן במפגש שהתקיים בעברית. המפגש כלל הרצאה באולם + אפשרות להתחבר דרך זום. בחרתי להתחבר דרך זום.
ה-Live Transcribe הופעל על טאבלט שצותת ללאפטופ באמצעות רמקולים+מיקרופון, ואילו WebCaptioner הופעל בדפדפן על אותו הלאפטופ שעליו זום רץ כאפליקציה. שתי התוכנות כוונו לזהות דיבור בעברית.

הממצא שלי היה שרוב הזמן שתי התוכנות לא סיפקו את הסחורה. הן פלטו מילים עבריות ללא תחביר וללא קשר עם נושא המפגש. עם זאת היו רגעים שבהם הן עבדו נכון.
שמתי לב גם שכשהתוכנות לא עובדות נכון, כל אחת מוציאה פלט אחר. כשהן עובדות נכון, הטקסטים שהן מוציאות דומים זה לזה.

במפגש, המרצים לא השתמשו במיקרופון אלחוטי שנמצא קרוב מאוד לפה שלהם, אלא היו עומדים במרחקים משתנים מהמיקרופון. כשעמדו קרוב למיקרופון, התוצאות היו יותר טובות מאשר כשהם עמדו רחוק ממנו.
ב-“יותר טובות” אני מתכוון לכך שהתוכנות פלטו כל הזמן טקסט, במקום שתיקות ארוכות עם קטעי טקסט קצרים מפעם לפעם.

זה מאשר את הממצא שעליתי עליו לפני הרבה זמן – צריך להתייחס לתוכנות לזיהוי דיבור ממוחשב כמו אל כבדי שמיעה. כמו שרעשי סביבה מפריעים להם מאוד להבין דיבור, גם לתוכנות רעשי סביבה מפריעים מאוד.

I want to scrap a PDF file. Where can I find information?

Personally, I had good experience with poppler-utils for text scrapping. For tables, I used Tabula (actually I used tabula-py which is a simple wrapper for Tabula).

Rufus Pollock’s tools review article from 2016 is still relevant today.

If you prefer to read a tutorial, here is one: Get Started With Scraping – Extracting Simple Tables from PDF Documents. Note that it was written at 2013, so some tools may have died by now. Another tutorial: How I parse PDF files.

Finally, a paper about the challenges of scraping PDF files: Towards High-Quality Text Stream Extraction from PDF.

To get more PDF tips and updates, subscribe to my mailing list

Want to forward messages with images in Evolution E-mail client

Problem

I use the E-mail client Evolution running under Linux (Debian Buster distribution), version 3.30.5-1.1. I also configure it to default to Plain Text format when creating new messages.

I found that when I get a HTML message with embedded images and want to forward it to someone else, the text is forwarded but not the images, even when I set the forwarded message’s format to HTML (instead of Plain Text).

This happens in all four possible ways of forwarding the message (Forward As Attached, Forward As Inline, Forward As Quoted, Redirect).

Solution

The workaround I found is to reconfigure Evolution to default to HTML format before forwarding the message, and return to Plain Text afterwards. So when I click on the Forward button and Evolution initially constructs the message to be forwarded, it includes all contents of the original message, including the embedded images.

To reconfigure the default message format:

  1. Open the “Evolution Preferences” pop up dialog: Edit / Preferences.
  2. Select the pop up dialog pane: Composer Preferences / General.
  3. Toggle checkmark in Default Behavior / Format messages in HTML.
  4. Click on the Close button at bottom right of the dialog.

After defaulting to HTML format, forwarding an HTML-formatted message with images preserves the images, in all four possible ways of forwarding the message.

(Submitted to gitlab.gnome.org as Evolution issue #1406 and to bugs.debian.org as bug #984599.)

Which videoconferencing software supports closed captions?

If you need to explore alternative videoconferencing services, the following may help you. This information is up to date as of March 01, 2021.

Zoom supports both free tier, integration support, closed captions and API for 3rd party captioning.

RingCentral supports free tier, integration support and closed captions. No API for 3rd party captioning.

Webex supports free tier and integration support. No closed captions support was found!

Caption, for the deaf and HoH, a webinar streamed from Zoom to YouTube

Are you organizing a Zoom webinar and you need to both stream it to YouTube and provide real time closed captions for making it accessible to the deaf and Hard of Hearing?
Have you found, to your horror, that you do not know where to start?

Well, the following links will help you.

Additional links about making Zoom meetings and webinars accessible.

You feel guilty due to very high pay at work – what to do?

Say, you are being paid piece rate on work you do as a contractor. However your employer set up things in such a way that you are very productive. You feel that you are being overpaid and feel guilty about this.

What to do?

Do not suggest a reduction of your piece rate.

Instead of taking less from your employer, give your employer more.

Invest time at higher quality work even if it slows you down a bit. Do more work on your pieces. Look for ways to optimize even more the workflow in which you are a part. Train other workers to be more productive.

Theodor Herzl and the Basic Income plans

One hundred and twenty years ago, Theodor Herzl published a cautionary tale about what we know today as Universal basic income plans. He probably based his concerns upon the experience of the Romans under the Lex Frumentaria (buy grain from North Africa and Sicily and distribute it to citizens at a low price) plan. See also Gaius Gracchus.

Theodor Herzl wrote the story in German, and it was translated into Hebrew. I remember having read the Hebrew translation of the story at my childhood.

When I wanted to present to English speakers a contrarian point of view about the Universal basic income plans, I found to my surprise that no English translation of the story existed.

Well, now the English translation of Herzl’s story is now available for your enjoyment and education.

Alleviate social suffering from the COVID-19 by shortening its incubation period

A crazy idea:

Develop a preventive treatment for COVID-19, whose operation would be to shorten (yes, shorten!) the incubation period from contagion until development of the disease’s symptoms. So that the incubation period will be one or two days long like flu, instead of a week or even longer.

Then, ask everyone to undergo the treatment (maybe take pills).

This approach has few advantages:

  1. People, who were infected, will infect less other people, because they’ll know that they were infected and will isolate themselves promptly.
  2. There will be less need for PCR tests to confirm COVID-19 infection (why are we not doing tests to confirm flu infection?).
  3. People, who need to self-isolate, can release themselves from isolation faster, as the confirmation of their health would arrive earlier.
  4. It is possible that thanks to change in the flow of the disease, less people will suffer from the serious form of the disease. Of course, the opposite situation can happen and then it’ll be necessary to find another treatment, which does not have this side-effect.

הקלה על הסבל החברתי מהקורונה על ידי קיצור תקופת הדגירה שלה

רעיון מטורף:לפתח טיפול מונע לקורונה שהפעולה שלו תהיה לקצר (כן, לקצר!) את תקופת הדגירה מרגע ההידבקות ועד להתגלות סימפטומי המחלה, כך שיהיה יום יומיים כמו שפעת במקום להיות שבוע ואפילו שבועיים.

ואז לבקש מכל אחד לעבור את הטיפול (אולי לקחת כדורים).

לדרך פעולה זו יש כמה יתרונות:

  1. אנשים שנדבקו – ידביקו פחות אנשים, כי יידעו שהם חולים ויבודדו את עצמם יותר מהר.
  2. הצורך בבדיקות לאימות הידבקות בקורונה יירד (למה לא עושים בדיקה לאימות שנדבקנו בוירוס שפעת?).
  3. אנשים שצריכים להיות בבידוד יוכלו להשתחרר מהבידוד תוך זמן יותר קצר.
  4. יש מצב שבגלל שינוי מהלך המחלה, אחוז יותר קטן של אנשים יסבלו מהצורה החמורה שלה. כמובן שיכול להיות גם מצב הפוך ואז יהיה צורך לחפש טיפול אחר שאין לו תופעת הלוואי הזו.

What to reply to a computer science student who asked you to be his accomplice in cheating?

You probably are familiar with the phenomenon of students, who pay other people to write term papers, theses and projects for them to submit in order to meet academic requirements.

Few years ago, a computer science student named R. (a pseudonym) approached me and asked me to write for him and his partner a computer program, so that they will submit it to meet a requirement in order to pass a course, which they were studying.

Instead of taking money from him, I replied to him as follows.

I am approaching your question from the point of view of a mentor, teacher or a wise person needing to advise a young person, who is in a difficult situation and who is considering a bad solution to his problem. What the young person really needs is not to have someone else do his project for him, but long-term thinking: what are the long-term consequences of this solution, what alternative solutions exist, which obstacles exist in the alternatives, how to overcome those obstacles, the need to summon courage to change course.

For starters, as far as I am concerned, what you asked for is in the grey area between cheating and having an original solution to the problem. This is because certificates are not worth that much in the vocation of software development. Either the developer knows how to program or he doesn’t know, no matter what degrees or impressive certificates he has. If he does not know how to program, then within half a year his employer, if the employer has a clue, knows about it, and gives him a kick in the ass – reducing the long-term damage. Also, there are several people, who take on big projects and hire other people to do the actual work. However, the difference is that they have to provide the project with services such as marketing skills, project management, search and selection of development tools, money handling, etc. – instead of (or in addition to) software development skills.

Now to the point. Before proceeding further with what you and your partner are contemplating doing – I highly recommend that both of you read Ayn Rand’s “The Fountainhead” and follow Peter Keating’s career development in the book. He started out relying upon other people, like you are contemplating doing, made an impression on the right people and reached the top of his profession.

But… he didn’t last long and eventually he fell. And the sad truth is that he trained for the wrong vocation. There was a vocation that suited him perfectly, and he could really excel at it, but his mother pressed him to learn the vocation he actually learned (and in which he eventually failed). The saddest thing about his story is that when he realized which vocation is right for him and started engaging in it – it turned out that he started it too late and could not reach a high level of proficiency in that vocation.

If you and your partner decide to pay someone else to do your project, then:

  • Anyone, who knows that you have done this, will be unable to help you look for a job, because they will have to lie if they vouch for your software development skills.
  • During software development work, there are periods of extreme pressure. Schools plan their course syllabuses so that an average student can handle the resulting pressure (with some sighs and groans). At work, pressure can be unlimited. So if you are unable to cope with pressure in school, it is very unlikely that you can cope with it at work. So you should consider a vocation, in which there is no such pressure.
  • You give up the fight to be really good professionals, who know when to accept failure like men (even at work there are some projects which fail, due to all kind of reasons, such as over-optimistic effort estimates, and it’s better to admit failure and move on to another project), and instead of accepting failure and its consequences, you are heading toward pretense.
What to do now?

I suggest that you first carefully review the decisionmaking process that led you to decide on a vocation in the software world. If you have taken psychotechnic tests and consulted with a specialist in the area of vocational selection, one of the tests was probably as follows:

  1. Go over a very long list of topics and highlight those which interest you.
  2. Group the interesting topics into groups, such that the topics in each group have the same theme from your point of view.
  3. Go over the groups and identify potential vocations related to each group.

Why am I telling you all this? Because if you kept the papers from your evaluation (or you can get them), you might find there a clue for identifying a vocation, which really attracts you and in which you can excel.

The next step is to determine if you have relatives, who are unwilling to accept that your future is not in the lucrative and profitable software world, but in another direction. Then check if and how to neutralize their influence upon your choice of the vocation that fits you.

I assume that the computer world is appealing to you, so you may want to check out some other vocations in this world besides writing software (I remember that in Hadassah Institute for Professional Selection Counseling in Jerusalem, where I did my vocational counselling, there was a library with descriptions of thousands of vocations – such a library could help you choose the right vocation for you). Examples: training, installation and configuring, software testing, maybe even administrative project management. Then go on to specialize in the vocation that suits you and in which you can excel.

True, you already started studying and already invested two years in your studies, and now I am proposing to write off all this investment and start over? Yes, however as far as getting a certificate or a degree is concerned, some of the investment will probably be lost. But as I said above, certificates are not that valuable in the software world. Like a pilot’s license does not turn someone, not having the aptitude to pilot, into an ace fighter pilot; also a software developer’s certificate does not turn someone not fit to be software developer into a great software developer. In terms of content – I’m sure you’ve learned something that will help you in any direction you choose for the rest of your life. And as far as the requirements for finishing your studies are concerned, once you know which direction is right for you, you probably can switch to a major which fits your vocational goals. In this case, you’ll probably be able to use some of the credits of the courses that you already completed. So what you already studied is not a total loss.

P.S.:

A student, who is paying someone else to do his homework, term papers, projects or theses, is like a basketball player who is paying someone else to go to his team’s practice sessions.

Orphan Technologies

Hi-Tech is failing people with disabilities

The other day, Nathan Zeldes wrote to me:

Between us, I’ve always been pissed off by the lack of progress in hi-tech solutions for severe handicaps; the fact that even the legendary Stephen Hawking was using a robot voice sounding like a Commodore 64 shows how little incentive companies (and society) have in driving leading edge solutions that could liberate people from severe disabilities.

To which I replied:

The problem is a lack of incentive to develop technologies which would help only few people. It just is not profitable. People cannot have a decent standard of living or support wife & children by working only on such problems. Subsidizing the development of such a technology could lead to the basic problem of socialism (possibility of turning a profit NOT by serving another person, the basis of “true” capitalism).

A similar problem exists with “orphan medicines” – medications and
procedures for treating very rare illnesses.

What could be done?

In discussions with Nathan Zeldes and with Dr. Yoav Medan (who is involved with the orphan technology of 3D printing of prosthetic hands), the following ideas were mentioned.

1. Students doing Final Projects

  • STEM students, who do their final projects, can profit from working
    on an orphan technology as their final project. The students provide
    a service and in exchange for it, they gain experience which will help
    them later make more money in their careers.
    However, most students cannot bring a product to market. The
    best they can do is to solve problems in a local and limited community.
  • People, who are not students, could gain both experience and reputation by working on such problems.
  • Companies could sponsor such projects, in order to get favorable
    advertising, improve their reputation, etc.
  • It would be a good idea to develop ways to quickly monetize experience/reputation to allow people to live well by doing those projects for a living.

2. Dual-use Technologies

For the deaf and HOH (Hard of Hearing), most of the relevant technologies happen to have dual use, starting from Alexander Graham Bell’s telephone. Robert Weitbrecht’s acoustic coupler was useful not only for allowing deaf people use teletypes over phone lines (and not only over telex lines) but also for other data communication users.

My personal experience was with adding Hebrew support to the Nokia 9110 and Nokia 9210 smartphones at the beginning of 21st century. Those cellular phones were very useful for the deaf in the pre-SMS era thanks to their ability to send and receive FAX messages. Since Hebrew support was useful also for Hebrew-speaking hearing people, it was a profitable endeavor for Erez Zino and me. See also: כנגד קול הסיכויים (in Hebrew).

A variant of this approach is for biotech and pharma companies, when developing a new technology, to first develop it to treat orphan/rare diseases. This gives them regulatory and reimbursement advantages. Once the technology is developed, it is applied also to common diseases, for which established therapies already exist.

An example is Minovia, which is developing a cell therapy technology to treat mitochondrial diseases. They began by targetting the Pearson Syndrome, which affects only 100 children worldwide.

3. “Micro-business” methodology and support services

Orphan technologies become orphan because the Hi-Tech world is based upon economics of scale. To develop a technology, you need a sufficiently big market to make it worthwhile. A business needs to have a minimum size to have any chance for success.

A methdology, infrastructure and support services to facilitate “micro-businesses” would overcome the above barrier. A micro-business would be a business, which does not require more than few hours a month, after some reasonable initial investment in building it, and would be very profitable (in terms of net income per hour) serving its very limited market.

One such possibility is to have spread out creativity centers (both physical and in the WWW) which help people develop their ideas. Examples: TAMI hackerspace and HAIFAUP.

4. Affluent end-users subsidizing the development

One could get affluent people needing an orphan technology to fund its development. Even if they are few, just one millionaire, with a child afflicted with the problem, could be enough to fund the orphan technology’s development.

Variations of this approach:

  • Government funding of technologies needed to rehabilitate army veterans with disabilities.
  • Collaboration with a non-profit devoted to the disease in question. Some of them have money or access to donors.
  • Philanthropic funding (from people not needing the orphan technology or themselves).
  • A variant of philanthropic funding is to use crowdfunding websites (Headstart, FundIt, PipelBiz, Indiegogo, KickStarter, etc.) to donate to a project.
  • Some companies declare upfront that they will allocate a certain percentage of their profits to social causes (including orphan technologies development), without expectation to make any financial returns.

5. Impact Investments

Some people invest not only for profit but also for social impact. They invest in underserved areas where they can see an eventual upside. An example is Social Finance Israel.