herrmann's blurblog

Embedding JupyterLite In-Browser Notebooks in Documentation and Online Educational Materials by Tony Hirst
Thursday March 10^th, 2022 at 7:08 AM

ouseful.info, the blog…

JupyterLite, if you haven’t already come across it, is an in-browser Jupyter environment that can execute Python (scipy stack) code purely within the browser. The code is executed via a WASM powered pyodide environment (essentially, a virtual machine that runs within your browser to provide a Python environment you can access from a web page). The only downside is it can take what feels like forever to download open (I’m not sure what’s cacheable and what isn’t?).

Several Jupyter environments are available:

the full JupyterLab environment;
a RetroLab jupyter notebook environment;
a REPL console (I really dislike the layout of this – the “command line” is waaaaaaay down the display. I think I’d rather it were at the top of the screen, and the output displayed and scrolled under it…).

All that’s missing is a “single executable cell” mode, although there is an open issue on that. (More generally, better support for opening and saving data files to browser storage, and perhaps even disk/local file system, would also be helpful…)

If you prefer to run classic notebook purely in the browser against a pyodide kernel, use Basthon (this always seems to load much faster…). I’m not sure if English language packs are available for it…

In the JupyterLab and Retrolab environments, notebooks can be saved to and loaded from browser storage (Basthon notebook edits are also persisted in browser storage, I think?).

Interestingly, the environments can be embedded within other web pages. There are already examples of documentation sites starting to explore embedding notebook demos in the documentation website, as, for example, ipycanvas.

On the numpy docs site, a jupyterlite REPL console is embedded alongside code you can copy and paste, and then try out:

A sphinx extension, jupyterlite-sphinx, provides several directives that allow you to embed JupyterLab, Retrolab or the REPL console in Sphinx generated documentation or Jupyter Books. For the RetroLab/RetroLite enviornment, you can specify which notebook you want to embed:

One of the main issues I have with Jupyter Book in an educational use context is the inability to persist any changes the user may make to the code cell.

Embedding a RetroLite notebook gets round this to a certain extent if the user is always working from the same browser (i.e. they can access the same browser storage), because edits will be persisted to browser storage.

There are risks with persisting changes, eg if a student edits and breaks some provided code and can’t fix it, so it’d be handy if there were also a way to cache as read-only the orginal document and allow it to be restored.

Consequently, it might be useful if there were a way to disable the save-to-browser storage both as a notebook toolbar button, and as an environment setting, and expose that setting as a jupyterlite-sphinx parameter. This would then provide the option of making a notebook essentially editable and executable, but not saveable.

When using the REPL environment, I suspect that each REPL session will be in its own environment (*not tested*) which means that state will not persist across REPLs in the same page or across pages in the same book. If a single cell execution enviroment is ever supported, it would be useful for this to have three modes: stateless (always run in a new environment); page stateful (maintain state across all cells within a particular page but not across pages); and book state (persist state across all pages in a book). In the latter two cases, a big button to “restart kernel” and reset the state would also be useful.

Read the whole story

herrmann

1392 days ago

reply

Brazil

Introducing spaCy v2.3
Tuesday July 14^th, 2020 at 12:21 PM

Explosion · RSS Feed

spaCy now speaks Chinese, Japanese, Danish, Polish and Romanian! Version 2.3 of the spaCy Natural Language Processing library adds models for five new languages. We’ve also updated all 15 model families with word vectors and improved accuracy, while also decreasing model size and loading times for models with vectors.

This is the last major release of v2, by the way. We’ve been working hard on spaCy v3, which comes with a lot of cool improvements, especially for training, configuration and custom modeling. We’ll start making prereleases on spacy-nightly soon, so stay tuned.

New languages

spaCy v2.3 provides new model families for five languages: Chinese, Danish, Japanese, Polish and Romanian. The Chinese and Japanese language models are the first provided models that use external libraries for word segmentation rather than spaCy’s tokenizer.

Chinese

The new Chinese models use pkuseg for word segmentation and ship with a custom model trained on OntoNotes with a token accuracy of 94.6%. Users can initialize the tokenizer with both pkuseg and custom models and customize the user dictionary. Details can be found in the Chinese docs. The Chinese tokenizer continues to support jieba as the default word segmenter along with character-based segmentation as in v2.2.

Japanese

The updated Japanese language class switches to SudachiPy for word segmentation and part-of-speech tagging. Using sudachipy greatly simplifies installing spaCy for Japanese, which is now possible with a single command: pip install spacy[ja]. More details are in the Japanese docs.

Model Performance

Following our usual convention, the sm, md and lg models differ in their word vectors. The lg models include one word vector for most words in the training data, while the md model prunes the vectors table to only include entries for the 20,000 most common words, mapping less frequent words to the most similar vector in the reduced table. The sm models do not use pretrained vectors.

Language	Model	Size	TAG	UAS	LAS	ENTS F
Chinese	`zh_core_web_sm`	45 MB	89.63	68.55	63.21	66.57
	`zh_core_web_md`	75 MB	90.23	69.39	64.43	68.52
	`zh_core_web_lg`	575 MB	90.55	69.77	64.99	69.33
Danish	`da_core_news_sm`	16 MB	92.79	80.48	75.65	72.79
	`da_core_news_md`	46 MB	94.13	82.71	78.98	81.45
	`da_core_news_lg`	546 MB	94.95	82.53	78.99	82.73
Japanese	`ja_core_news_sm`	7 MB	97.30	88.68	86.87	59.93
	`ja_core_news_md`	37 MB	97.30	89.26	87.76	67.68
	`ja_core_news_lg`	526 MB	97.30	88.94	87.55	70.48
Polish	`pl_core_news_sm`	46 MB	98.03	85.61	78.09	81.32
	`pl_core_news_md`	76 MB	98.28	90.41	84.47	84.68
	`pl_core_news_lg`	576 MB	98.45	90.80	85.52	85.67
Romanian	`ro_core_news_sm`	13 MB	95.65	87.20	79.79	71.05
	`ro_core_news_md`	43 MB	96.32	88.69	81.77	75.42
	`ro_core_news_lg`	545 MB	96.78	88.87	82.05	76.71

The training data for Danish, Japanese and Romanian is relatively small, so the pretrained word vectors improve accuracy quite a lot, in particular for NER. The Chinese model uses a larger training corpus, but word segmentation errors may make the word vectors less effective. Word segmentation accuracy also explains some of the lower scores for Chinese, as the model has to get the word segmentation correct before it can be scored as accurate on any of the subsequent tasks.

Word vectors for all model families

All model families now include medium and large models with 20k and 500k unique vectors respectively. For most languages, spaCy v2.3 introduces custom word vectors trained using spaCy’s language-specific tokenizers on data from OSCAR and Wikipedia. The vectors are trained using FastText with the same settings as FastText’s word vectors (CBOW, 300 dimensions, character n-grams of length 5).

In particular for languages with smaller training corpora, the addition of word vectors greatly improves the model accuracy. For example, the Lithuanian tagger increases from 81.7% for the small model (no vectors) to 89.3% for the large model. The parser increases by a similar margin and the NER F-score increases from 66.0% to 70.1%. For German, updating the word vectors increases the scores for the medium model for all components by 1.5 percentage points across the board.

Remember that models trained with v2.2 will be incompatible with the new version. To find out if you need to update your models, you can run python -m spacy validate. If you’re using your own custom models, you’ll need to retrain them with the new version.

Updated training data

All spaCy training corpora based on Universal Dependencies corpora have been updated to UD v2.5 (v2.6 for Japanese, v2.3 for Polish). The updated data improves the quality and size of the training corpora, increasing the tagger and parser accuracy for all provided models. For example, the Dutch training data is extended to include both UD Dutch Alpino and LassySmall, which improves the tagger and parser scores for the small models by 3%, and the addition of the new word vectors improve the scores further by 3-5%.

Fine-grained POS tags

As a result of the updates, many of the fine-grained part-of-speech tag sets will differ from v2.2 models. The coarse-grained tag-set remains the same, although there are some minor differences in how they are calculated from the fine-grained tags.

For French, Italian, Portuguese and Spanish, the fine-grained part-of-speech tag sets contain new merged tags related to contracted forms, such as ADP_DET for French "au", which maps to UPOS ADP based on the head "à". This increases the accuracy of the models by improving the alignment between spaCy’s tokenization and Universal Dependencies multi-word tokens used for contractions.

Smaller models and faster loading times

The medium model packages with 20k vectors are at least 2× smaller than in v2.2, the large English model is 120M smaller, and the loading times are 2-4× faster for all models with vectors. To achieve this, models no longer store derivable lexeme attributes such as lower and is_alpha and the remaining lexeme attributes (norm, cluster and prob) have been moved to spacy-lookups-data.

If you’re training new models, you’ll probably want to install spacy-lookups-data for normalization and lemmatization tables! The provided models include the norm lookup tables for use with the core pipeline components, but the optional cluster and prob features are now only available through spacy-lookups-data.

Free online course and tutorials

We’re also proud to announce updates and translations of our online course, “Advanced NLP with spaCy”. We’ve made a few small updates to the English version, including new videos to go with the interactive exercises. It’s really the translations we’re excited about though. We have translations into Japanese, German and Spanish, with Chinese, French and Russian soon to come.

Speaking of videos, you should also check out Sofie’s tutorial on training a custom entity linking model with spaCy. You can find the code and data in our growing projects repository.

Another cool video to check out is the new episode of Vincent Warmerdam’s “Intro to NLP with spaCy” . The series lets you sit beside Vincent as he works through an example data science project using spaCy. In episode 5, “Rules vs. Machine Learning”, Vincent uses spaCy’s rule-based matcher to probe the decisions of the NER model he trained previously, using the rules to understand the model’s behavior and figure out how to improve the training data to get better results.

What’s next?

spaCy v2.3 is the last big release of v2. We’ve been working hard on v3, which we expect to start publishing prereleases of in the next few weeks. spaCy v3 comes with a lot of cool improvements, especially for training, configuration and custom modeling. The training and data formats are the main thing we’ve taken the opportunity to fix, so v3 will have some breaking changes, but don’t worry — it’s nothing like the big transformations seen in libraries like TensorFlow or Angular. It should be pretty easy to upgrade, but we’ve still tried to backport as much as possible into v2.3, so you can use it right away. We’ll also continue to make maintenance releases of v2.3 with bug fixes as they come in.

We also have a big release of our annotation tool Prodigy pretty much ready to go. In addition to the spaCy v2.3 update (giving you all the new models), Prodigy v1.10 comes with a new annotation interface for tasks like relation extraction and coreference resolution, full-featured audio and video annotation (including recipes using pyannote.audio models in the loop), a new and improved manual image UI, more options for NER annotation, new recipe callbacks, and lots more. To get notified when it’s ready, follow us on Twitter!

spaCy v2.3: What’s new in v2.3
Release notes: Detailed overview
spaCy models directory: Download pretrained models

Read the whole story

herrmann

1996 days ago

reply

Brazil

What’s more stressful than lockdown? The easing of lockdown | Suzanne Moore by Suzanne Moore
Monday July 6^th, 2020 at 12:35 PM

The Guardian

We have been through a collective trauma and need time to adjust to the new world. Instead, we are being told to go and have a pint. No wonder we can’t handle it

The least-helpful piece of advice in the world is: “Just be yourself.” (Seriously? It won’t go well.) The second-least-helpful is: “Just act normal.” I have been acting normal sitting on a train in a mask and gloves, while my glasses misted up, telling myself: “This is just fine.” Then I acted normally by sitting in the drizzle outside a pub with a young man desperate for his first pint of Guinness in a while, only to be told by frazzled bar staff that they had no Guinness. Then I acted normally in another small town and had a drink outside! Until I got up to go to the loo, and my mate stopped me. “Don’t go in there,” she said ominously. “I have a bad feeling about it.”

In a world in which we’re all “acting normal”, we shouldn’t have all these bad feelings, should we? But they won’t go away. We are now required to make snap judgments about what is safe and what isn’t because, actually, we don’t really know. Friends have reported “accidentally hugging” their own grownup kids over the weekend. And despite the pictures, most people in Soho in London on Saturday night were not really engaged in some bacchanal. (Nudge, wink, what was all the outcry about, if not homophobia?) The truth is, most of us are edging back into the water, not diving in.

Read the whole story

herrmann

2004 days ago

reply

Brazil

Python Raiz by Osvaldo Santana
Thursday June 25^th, 2020 at 12:25 PM

osantana

Como todos sabem sou programador Python a quase duas décadas e boa parte da minha carreira foi construída em torno desta linguagem. E isso mostra como ela é excelente. Pra uma pessoa como eu, que adora aprender e estudar linguagens de programação, significa muito priorizar uma só linguagem por quase metade da vida.

Mas o que me motivou a escrever esse texto aqui foram as adições recentes (ou nem tão recentes) à linguagem. Coisas como suporte para desenvolvimento assíncrono (async, await, etc), anotações de tipos, f-strings, operador “walrus” (:=), e a mais recente delas (cuja PEP ainda está em “Draft”): Pattern Matching.

Python Nutella

O conceito de Pattern Matching já existe a bastante tempo (Prolog?) mas ficou mais popular com a recente adoção da excelente linguagem Elixir por muitos desenvolvedores.

O uso mais básico de Pattern Matching pode ser visto nas instruções switch/case presente em várias linguagens de programação estruturadas. Mas Pattern Matching não é “só” um switch/case. Nos tradicionais comandos de switch/case as linguagens avaliam uma expressão (switch expr) e dependendo do resultado da expressão ele busca um bloco de código que dê “match” com esse resultado (case constant-expr).

Mas em linguagens como Elixir (que é a que usarei como exemplo por conhecer melhor) esse “match” pode ser feito com regras muito mais elaboradas e usar estruturas de dados completas no lugar de apenas um valor constante, como nas linguagens que mencionei acima.

Abaixo vou colocar um exemplo de código pseudo-Python que demonstra como uma função pode ser implementada sem e com Pattern Matching:

Esse caso mostra como o conceito é poderoso e prático para resolver uma série de problemas. E a sintaxe parece razoavelmente natural. Mas parece que essa “naturalidade” some quando avançamos pelos exemplos da PEP e começamos a ver coisas como:

O que me assusta em tudo isso são as invenções de coisas que nunca estiveram no Python:

_ como caracter coringa: o caracter _ sempre foi usado, por convenção, como uma variável cujo valor pode ser descartado. Um exemplo disso seria fazer coisas como: name, url, *_ = 'name,url,extra,data'.split(','). Porque não usar a cláusula else: com o match?
Dotted-names: Os casos de Constant Value Patterns onde tiveram que inventar uma sintaxe nova com dotted-names é outra coisa que nunca vi no Python. Essa sintaxe foi descartada anteriormente para coisas muito mais simplórias como quando pediram algo parecido com with do Pascal.
| como operador de alternativa: | já existe em Python e ele faz as vezes de OR bit-a-bit. Também temos o or que faz as vezes de OR lógico. Me parece que, mesmo não sendo semanticamente a mesma coisa, o segundo OR lógico faz mais sentido do que o OR bit-a-bit. Uma opção aceitável, por familiaridade, seria o uso de || que também é usado como OR lógico em diversas linguagens.

Mas o que mais me desagrada nessas recentes adições à linguagem é a preocupação que tenho com “legibilidade natural da linguagem“. E não, essa legibilidade não tem relação direta com escrever código limpo e organizado para programadores Python. Tem relação com a capacidade de um código ser compreendido até mesmo por algum desenvolvedor que não conhece a linguagem.

Aprendendo Python

Python não foi minha primeira linguagem. Já conhecia outras antes dela. Essas linguagens eram majoritariamente estruturadas mas eu já tinha brincado um pouco com desenvolvimento orientado à objetos com Object Pascal do Turbo Pascal 6 e 7 (com Turbo Vision e tal).

Meu primeiro contato com Python se deu por volta do ano 2000 na extinta e saudosa Conectiva (aquela do Conectiva Linux que depois se juntou com a Mandrake pra formar a Mandriva que também sumiu… enfim… uma bagunça ).

Eu estava trabalhando em um projeto de “compilador de configuração para interfaces gráficas”. Esse é o nome chique que dei agora… na época era só um programa escrito em C que geraria os arquivos texto de configuração para Gnome, KDE e WindowMaker à partir de um conjunto de configurações centralizadas.

Mas o importante para a discussão de agora é a parte do “escrito em C” e “arquivos texto”. Todos sabem que C e texto não se harmonizam muito bem, certo? Então… no final tudo daria certo. Só demoraria mais tempo pra ficar pronto.

Certo dia meu chefe me trouxe um artigo que ensinava Python. Ele havia traduzido o artigo original para o português e pediu pra que eu desse uma revisada (acho que a intenção dele era outra já nem sabia inglês direito). O artigo original não está mais no ar mas tenho uma cópia dele (a tradução se perdeu).

A versão do Python, naquela época, era 1.5.2 (o artigo foi atualizado depois que eu li para acrescentar coisas do Python 2). E lendo só esse artigo eu aprendi Python em 1, uma, UMA ~~f*cking~~ noite! Com um artigo de blog!

Eu aprendi Python em uma noite porque eu sou inteligente? Esperto? Super-humano? Não! Dêem uma olhada no post.

Eu espero.

Viram como a linguagem do artigo é simples? Ela é legível, as construções dela são intuitivas: v = 1 atribui valor 1 pra v, v == 2 compara valor v com 2, v[0] acessa o primeiro elemento de um array/lista, class Person: ... define uma classe, e assim se segue. Ou seja, para uma pessoa que programa minimamente em algo conseguia aprender a linguagem muito rápido.

Tinham pouca coisa estranha nessa linguagem que não existia em outras. Talvez o fato de usar indentação pra delimitar os blocos de código e aquele parâmetro self nos métodos de instância. Mas tirando isso é tudo bem normalzinho.

Agora imaginem uma pessoa aprendendo Python com coisas como:

Imagina esbarrar com um código desses logo de primeira? O cara volta pro Perl Just kidding…

Eu sei que o código do artigo é praticamente o mesmo para rodar no Python de hoje (talvez só o uso de print() e input() tenha mudado com o Python 3).

Também sei que o artigo não ensina tudo sobre a linguagem (mesmo naquela versão da época). Ele é só uma introdução. E de fato, quando decidi me aprofundar mais no aprendizado da linguagem eu fui atrás de outros materiais. Li o The Python Tutorial que vem com a própria linguagem e na seqüencia importei um livro que realmente ensinava a linguagem toda: Learning Python da editora O’Reilly.

Naquela época eu comprei a primeira edição do livro que, hoje, já está na quinta edição que cobre até o Python 3.3.

Finalmente eu pude estudar tudo o que Python tinha. E até hoje eu recomendo esse livro. Mas vou falar uma curiosidade sobre ele: a 1ª edição tinha 384 páginas. A 5ª edição tem 1648 páginas! Mais de 4 vezes maior. E nem descrevem as (muitas) novidades do Python 3.4, 3.5, 3.6, 3.7, etc.

Ou seja, se você realmente pretende dominar tudo o que a linguagem oferece vai levar uma vida. Além do Learning Python eu ainda recomendo o Fluent Python (assim que a segunda edição sair eu atualizo o link) do meu querido amigo Luciano Ramalho.

Python para não-programadores

Todo mundo sabe que Python tem crescido muito na comunidade científica. Isso aconteceu muito graças à iniciativas como SciPy e de projetos que nasceram dentro dessa iniciativa e ganharam vida própria como o Jupyter, matplotlib, pandas, scikit-learn, etc.

Por outro lado Python também se tornou uma linguagem muito usada no ensino de programação de diversas escolas e universidades em todo o mundo. Usam ela para ensinar programação para todo mundo e não só para alunos dos cursos de computação e afins.

E essas duas coisas estão relacionadas. Se você ensina programação com Python para um aluno de biologia qual linguagem ele vai usar para escrever uma ferramenta que auxilia num trabalho de genética?

Porque essas escolas escolheram Python para ensinar programação Porque projetos como SciPy e Jupiter escolheram usar Python? Porque não escolheram outras linguagens?

Eu suspeito de que seja um conjunto de atributos da linguagem, entre eles:

Educacional: Python nasceu de um projeto de linguagem educacional (ABC), mas também nasceu porque seu criador (Guido van Rossum) acreditava que linguagens educacionais não precisavam ser de brinquedo (toy languages) e que deveria ser possível usá-las no dia-a-dia.
Multiplataforma: instalação, implementação e uso fácil nas principais plataformas disponíveis.
Multipropósito: você consegue desenvolver software de linha de comando mas também consegue implementar uma interface gráfica ou um servidor Web.
Multiparadigma: sabe programação estruturada? ok. Sabe modelagem OO? ok também. Sou craque em lambda functions? tá lá também um básico pra você usar.
Facilidade de integração com bibliotecas em de outras linguagens como C, Fortran, etc. Vocês devem imaginar o imenso número de bibliotecas científicas implementadas em outras linguagens. Porque reescrever tudo?

Mas o que eu acho mais importante para essa escolha é a de que todos os cientistas que já programaram alguma coisa para seu trabalho conseguem ver um código Python e ter uma ideia, mesmo que superficial, do que aquilo faz. É a tal “legibilidade natural da linguagem” que já mencionei.

Multipropósito

Se você quer um carro que comporte toda a família, seja potente, veloz, econômico, tenha porta-malas grande, seja espaçoso, confortável, etc você provavelmente vai acabar com isso aqui:

Python nasceu pra ser fácil de aprender e usar. Ter uma sintaxe limpa, clara e familiar. Sem coisas esdrúxulas como símbolos em excesso ou coisas menos convencionais como a sintaxe object message do Smalltalk (que inspira a sintaxe do Ruby).

Python também nasceu sem tipos ou anotações de tipos. Tudo nela foi pensada para abstrair esse conceito “mundano” da computação. Isso facilita o aprendizado das pessoas.

Python nunca pretendeu ter uma performance absurda. Só a performance necessária para permitir o seu uso em problemas reais. Se você precisasse de performance absurda você provavelmente escreveria código em C ou Assembly e “colaria” ele com Python.

Python nunca foi pensada para concorrência ou paralelismo. Então não dá pra “competir” com linguagens como Go ou Elixir (Erlang) nesse quesito. Essas linguagens nasceram pra lidar com esse tipo de problema. Acrescentar um punhado de palavras reservadas e algumas bibliotecas não torna Python ideal para esse propósito.

Python não nasceu como linguagem funcional. Mesmo ela tendo ferramentas que te permitam expressar algumas ideias de modo funcional ela não é uma linguagem funcional de verdade. Python nasceu como linguagem majoritariamente orientada à objetos.

Colocar um map() aqui e um lambda acolá não torna a linguagem própria para resolver problemas que são mais facilmente solucionáveis com linguagens funcionais. Python tem objetos mutáveis. Python não tem suporte nativo à tail-recursion. Macros? O próprio Guido já disse que “nem morto” (não achei a referência, mas acredite, eu vi isso).

Guarda-Chuva

Python já não é mais uma linguagem de nicho ou underground. Python já é mainstream e centenas ou milhares de empresas de diversos portes já usam ela em seus negócios.

Como consequência disso o número de vagas de emprego e trabalhos com Python cresceu vertiginosamente nos últimos anos e por isso, inevitavelmente, programadores de outras linguagens acabam tendo que lidar com Python em algum momento de suas carreiras.

Por conta dessa situação eu tenho a sensação de que a necessidade de adicionar certas funcionalidades no Python vem do desejo desses programadores de usar parte favorita da outra linguagem também em Python porque eles não conseguem pensar “do jeito Python”.

Lembro bem quando Java estava na moda e todo programador Python criava getters/setters nas classes. Inventavam mil maluquices como Interfaces (no Zope), protocols, … para terem algo parecido com o que Java oferecia. Um baita esforço pra programar Java em Python.

Hoje em dia parece que tá todo mundo querendo programar Elixir em Python, JS/Node em Python, …

Sem Conclusão

Vocês devem estar imaginando que estou ficando desgostoso com a linguagem ou que vou abandoná-la. Não estou não. As coisas novas que não gosto de usar irei ignorar (ex. operador :=). Outras que me fizeram até torcer o nariz mas passei a gostar entraram pro meu repertório (ex. f-strings).

Tem aquelas funcionalidades que me fazem torcer o nariz, e quando dei uma chance pra elas descobri que podem ser úteis se forem usadas com muita moderação (ex. type annotation) serão usadas com moderação.

Enfim… Vou continuar a usar e a gostar de Python. Até porque ela tem tantas qualidades que me deixou preguiçoso para mudar. Mas pretendo continuar a programar Python em Python. Daquele jeito raiz. Daquele jeito moleque… aquela programação Python de várzea.

Update: corrigido um erro no operador “OR lógico” para or e adicionei uma sugestão de uso do ||.

The post Python Raiz appeared first on osantana.

Read the whole story

herrmann

2015 days ago

reply

Brazil

Brazil's Fake News Bill Would Dismantle Crucial Rights Online and is on a Fast Track to Become Law by Veridiana Alimonti
Thursday June 25^th, 2020 at 12:21 PM

Deeplinks

Update: A new draft text was released shortly before the voting set for June 25^th. It doesn’t include blocking and data localization measures, but the surveillance and identification rules remain. Read more in the analysis of a coalition of digital rights groups in Brazil.

Despite widespread complaints about its effects on free expression and privacy, Brazilian Congress is moving forward in its attempts to hastily approve a "Fake News" bill. We've already reported about some of the most concerning issues in previous proposals, but the draft text released this week is even worse. It will hinder users' access to social networks and applications, require the construction of massive databases of users' real identities, and oblige companies to keep track of our private communications online.

It creates demands that disregard Internet key characteristics like end-to-end encryption and decentralised tool-building, running afoul of innovation, and could criminalize the online expression of political opinions. Although the initial bill arose as an attempt to address legitimate concerns on the spread of online disinformation, it has opened the door to arbitrary and unnecessary measures, that strike settled privacy and freedom of expression safeguards.

You can join the hundreds of other protestors and organizations telling Brazil’s lawmakers why not to approve this Fake News bill right now.

Here’s how the latest proposals measure up:

Providers Are Required to Retain the Chain of Forwarded Communications

Social networks and any other Internet application that allows social interaction would be obliged to keep the chain of all communications that have been forwarded, whether distribution of the content was done maliciously or not. This is a massive data retention obligation which would affect millions of innocent users instead of only those investigated for an illegal act. Although Brazil already has obligations for retaining specific communications metadata, the proposed rule goes much further. Piecing together a communication chain may reveal highly sensitive aspects of individuals, groups, and their interactions -- even when none are actually involved in illegitimate activities. The data will end up as a constantly-updated map of connections and relations between nearly every Brazilian Internet user: it will be ripe for abuse.

Furthermore, this obligation disregards the way more decentralized communication architectures work. It assumes that application providers are always able to identify and distinguish forwarded and non-forwarded content, and also able to identify the origin of a forwarded message. In practice, this depends on the design of the service and on the relationship between applications and services. When the two are independent it is common that the service provider will not be able to differentiate between forwarded and non-forwarded content, and that the application does not store the forwarding history except on the user's device. This architectural separation is traditional in Internet communications, including web browsers, FTP clients, email, XMPP, file sharing, etc. All of them allow actions equivalent to the forwarding of contents or the act of copying and pasting them, where the client application and its functions are technically and legally independent from the service to which it connects. The obligation would also negatively impact open source applications, designed to let end-users not only understand but also to modify and adapt the functioning of local applications.

It Compels Applications to Get All User's ID and Cell Phone Numbers

The bill creates a general monitoring obligation on user's identity, compelling Internet applications to require all users to give proof of identity through a national ID or passport, as well as their phone number. This requirement goes in the opposite direction to the principles and safeguards set out in the country's data protection law which is yet to enter into force. A vast database of identity cards, held by private actors, is in no way aligned with the standards of data minimization, purpose limitation and the prevention of risks in processing and storing personal data that Brazil’s data protection law represents.

Current versions of the "Fake News" Bill do not even ensure the use of pseudonyms for Internet users. As we've said many times before, there are myriad reasons why individuals may wish to use a name other than the one they have on their IDs and were born with. Women rebuilding their lives despite the harassment of domestic violence abusers, activists and community leaders facing threats, investigative journalists carrying out sensitive research in online groups, transgender users affirming their identities are just a few of examples of the need for pseudonymity in a modern society.

Under the new bill, users' accounts would be linked to their cell phone numbers, allowing -- and in some cases requiring -- telecom service providers and Internet companies to track users even more closely. Anyone without a mobile number would be prevented from using any social network -- if users' numbers are disabled for any reason, their social media accounts would be suspended. In addition to privacy harms, the rule creates serious hurdles to speak, learn, and share online.

Censorship, Data Localization, and Blocking

These proposals seriously curb the online expression of political opinions and could quickly lead to political persecution. The bill sets high fines in cases of online sponsored content that mocks electoral candidates or question election reliability. Although elections' trustworthiness is crucial for democracy and disinformation attempts to disrupt it should be properly tackled, a broad interpretation of the bill would severely endanger the vital work of e-voting security researchers in preserving that trustworthiness and reliability. Electoral security researchers already face serious harassment in the region. Other new and vague criminal offenses set by the bill are prone to silence legitimate critical speech and could criminalize users' routine actions without the proper consideration of malicious intent.

The bill revives the disastrous idea of data localization. One of its provisions would force social networks to store user data in a special database that would be required to be hosted in Brazil. Data localization rules such as this can make data especially vulnerable to security threats and surveillance, while also imposing serious barriers to international trade and e-commerce.

Finally, as the icing on the cake of a raft of provisions that disregard the Internet's global nature, providers that fail to comply with the rules would be subject to a suspension penalty. Such suspensions are unjustifiable and disproportionate, curtailing the communications of millions of Brazilians and incentivizing applications to overcompliance in the detriment of users' privacy, security, and free expression.

EFF has joined many other organizations across the world calling on the Brazilian parliament to reject the latest version of the bill and stop the fast-track mode that has been adopted. You can also take action against the "Fake News" bill now, with our Twitter campaign aimed at senators of the National Congress.

Read the whole story

herrmann

2015 days ago

reply

Brazil

.ORG Domain Registry Sale to Ethos Capital Rejected in Stunning Victory for Public Interest Internet by Karen Gullo
Tuesday May 19^th, 2020 at 2:05 AM

Deeplinks

ICANN Withholds Consent, Says Deal Lacked ‘Meaningful Plan to Protect’ .ORG Community

San Francisco—In an important victory for thousands of public interest groups around the world, a proposal to sell the .ORG domain registry to private equity firm Ethos Capital and convert it to a for-profit entity was rejected late yesterday by the Internet Corporation for Assigned Names and Numbers (ICANN).

The Electronic Frontier Foundation (EFF), which worked hand in hand with Access Now, NTEN, National Council of Nonprofits, Americans for Financial Reform, and many other organizations to oppose the sale, applauds ICANN’s well-reasoned decision to stop the $1.1 billion transaction from moving forward. In a statement, ICANN said rejecting the deal was the right thing to do because it lacked a meaningful plan to protect the interests of nonprofits and NGOs that rely on the .ORG registry to exist on the Internet and connect with the people they serve.

The sale would change Public Interest Registry (PIR), the nonprofit operator of .ORG, into an entity bound to serve the interests of its corporate stakeholders, not the nonprofit world. ORG is the third-largest Internet domain name registry, with over 10 million domain names held by a diverse group of charities, public interest organizations, and nonprofits, from the Girl Scouts of America and American Bible Society to Farm Aid and Meals On Wheels.

“We’re gratified that ICANN listened to the .ORG community, which was united in its opposition to the sale,” said EFF Senior Staff Attorney Mitch Stoltz. “Under the deal, .ORG would be converted to a for-profit entity controlled by domain name industry insiders and their secret investors. Nonprofits are vulnerable to the governments and corporations who they often seek to hold accountable. The public interest community rightly questioned whether an owner motivated by profits would stand up to demands for censorship of charities who rely on .ORG so that people can find and rely on their vital services.”

“The sale of .ORG was announced, without .ORG community input, not long after price caps on registration fees for domain names were lifted and PIR acquired new powers to allegedly ‘protect’ the rights of third parties,” said EFF Staff Attorney Cara Gagliano. “It was obvious to many that .ORG registrants could face higher operating costs and degradation of service as Ethos sought to increase fees and seek profitable arrangements with businesses keen to silence nonprofits. This concern grew after it was revealed that the transaction required taking on a $360 million debt obligation.”

If PIR wishes to press forward, it still must seek approval from courts in the state of Pennsylvania, where PIR is incorporated. As part of that process, the Pennsylvania state Attorney General may weigh in. EFF urges both to follow ICANN’s lead and reject the transaction. This will pave the way for a transparent process to select a new operator for .ORG that will act in the interests of the nonprofits that it serves.

Contact:

Mitch

Stoltz

Senior Staff Attorney

mitch@eff.org

Cara

Gagliano

Staff Attorney

cara@eff.org

Read the whole story

herrmann

2052 days ago

reply

Brazil

Embedding JupyterLite In-Browser Notebooks in Documentation and Online Educational Materials by Tony Hirst Thursday March 10th, 2022 at 7:08 AM

Introducing spaCy v2.3 Tuesday July 14th, 2020 at 12:21 PM

New languages

Chinese

Japanese

Model Performance

Word vectors for all model families

Updated training data

Fine-grained POS tags

Smaller models and faster loading times

Free online course and tutorials

What’s next?

What’s more stressful than lockdown? The easing of lockdown | Suzanne Moore by Suzanne Moore Monday July 6th, 2020 at 12:35 PM

Python Raiz by Osvaldo Santana Thursday June 25th, 2020 at 12:25 PM

Python Nutella

Aprendendo Python

Python para não-programadores

Multipropósito

Guarda-Chuva

Sem Conclusão

Brazil's Fake News Bill Would Dismantle Crucial Rights Online and is on a Fast Track to Become Law by Veridiana Alimonti Thursday June 25th, 2020 at 12:21 PM

Providers Are Required to Retain the Chain of Forwarded Communications

It Compels Applications to Get All User's ID and Cell Phone Numbers

Censorship, Data Localization, and Blocking

.ORG Domain Registry Sale to Ethos Capital Rejected in Stunning Victory for Public Interest Internet by Karen Gullo Tuesday May 19th, 2020 at 2:05 AM

Embedding JupyterLite In-Browser Notebooks in Documentation and Online Educational Materials by Tony Hirst
Thursday March 10^th, 2022 at 7:08 AM

Introducing spaCy v2.3
Tuesday July 14^th, 2020 at 12:21 PM

What’s more stressful than lockdown? The easing of lockdown | Suzanne Moore by Suzanne Moore
Monday July 6^th, 2020 at 12:35 PM

Python Raiz by Osvaldo Santana
Thursday June 25^th, 2020 at 12:25 PM

Brazil's Fake News Bill Would Dismantle Crucial Rights Online and is on a Fast Track to Become Law by Veridiana Alimonti
Thursday June 25^th, 2020 at 12:21 PM

.ORG Domain Registry Sale to Ethos Capital Rejected in Stunning Victory for Public Interest Internet by Karen Gullo
Tuesday May 19^th, 2020 at 2:05 AM