Synthetic ♻️

This is the start of the Synthetic channel, created by loleg on January 22, 2022. Any member can join and read this channel. This channel's purpose is: Synthetic data generated from computer simulations or algorithms provides an inexpensive alternative to real-world data that's increasingly used to create accurate AI models.


loleg 14:15

https://www.census.gov/about/what/synthetic-data.html

Census.gov

GitHub - theodi/synthetic-data-tutorial: A hands-on tutorial showing how to use Python to do anonymisation with synthetic data

A hands-on tutorial showing how to use Python to do anonymisation with synthetic data - GitHub - theodi/synthetic-data-tutorial: A hands-on tutorial showing how to use Python to do anonymisation wi...

AIcrowd | NeurIPS 2019 : Disentanglement Challenge | Challenges

AIcrowd | NeurIPS 2019 : Disentanglement Challenge | Challenges

Disentanglement: from simulation to real-world

loleg 23:13

Not to be confused with synthetic biology, here we are talking about data that "are not obtained by direct measurement". This is one of the many excellent topics that were debated by students participating in the University of Bern 3 day mini-course for Open Government Data that I had the pleasure of supporting for the second time.

"Crowdsourcing A.I. to solve real world problems" (AIcrowd slogan)


January 24, 2022
10:43

@jurek joined the channel.

loleg 17:44

:wave: welcome @jurek !


January 25, 2022

loleg 11:53


January 28, 2022

loleg 07:50

Open-sourcing HASH - HASH

A few months ago we revealed we were working on something new. The Block Protocol is now in public draft, and an early preview of our new product, HASH, is available for download on GitHub under an open-source license.


January 31, 2022

loleg 21:11

Eliminate Privacy Concerns With Synthetic Data - Gretel

Train machine learning models on your dataset and generate synthetic data that is statistically equivalent. Get started free with a single click.


February 17, 2022

loleg 09:43

The Economist

OpenMined - Open Collective

OpenMined is an open-source community whose goal is to make the world more privacy-preserving by lowering the barrier-to-entry to private AI technologies.


February 18, 2022

loleg 09:46

Curated Health

𝐔𝐬𝐢𝐧𝐠 #𝐒𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜𝐝𝐚𝐭𝐚 𝐟𝐨𝐫 𝐩𝐫𝐞𝐜𝐢𝐬𝐢𝐨𝐧 𝐡𝐞𝐚𝐥𝐭𝐡𝐜𝐚𝐫𝐞 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐝𝐚𝐭𝐚 𝐩𝐫𝐢𝐯𝐚𝐜𝐲 𝐜𝐨𝐧𝐜𝐞𝐫𝐧𝐬!

💻 Synthetic data has been defined as “the generation of artificial data with the aim of reproducing the statistical properties of an original dataset” and it intends to capture the original ...


February 25, 2022

loleg 07:20

Faker | Faker

Generate massive amounts of fake (but reasonable) data for testing and development.

loleg 20:05

A separate, but related, proposal aims to address “synthetic content,” an umbrella term encompassing fake news, synthetic audio, and deepfake images and videos, in which a person’s face is stitched onto someone else using AI. Among the provisions, makers of deepfake software would be required to verify the real names of creators and “conspicuously label” any deepfaked content. Deepfake apps, popular in China, have stirred public debate over privacy and ownership of personal data. https://www.wired.com/story/china-regulate-ai-world-watching/?mc_cid=1e1ad5e9ba&mc_eid=6ebf3bae07


February 26, 2022

loleg 19:12

...For all its benefits, open data also carries risk. Open data can certainly accelerate AI development, but using massive public datasets to train models can unintentionally undermine privacy or perpetuate encoded biases. Even the pioneering ImageNet data faced some of these risks as creators removed people-related categories and blurred individuals’ faces to try to protect their privacy.7 For open datasets released by the public sector, government leaders should be cognizant of the risks and take steps to ensure that open data offers a safe path to future AI. https://www2.deloitte.com/us/en/insights/industry/public-sector/open-data-ai-explainable-trustworthy.html

Deloitte Insights

Trustworthy open data for trustworthy AI

Government leaders need to be cognizant of the risks of using open data to train AI and ensure a safe path to bring benefits to citizens.


March 06, 2022

loleg 16:31

Machine Learning in JS with TensorFlow.js (Part I)

Created for AI Day events in Bangkok (Apr 21, 2018) and Singapore (Apr 29, 2018), this tutorial is based on: TensorFlow.js Introduction to the TensorFlow.js core API Machine Learning in JavaScript (TensorFlow Dev Summit 2018) TensorFlow.js is a general purpose, WebGL-accelerated numerics platform...


April 06, 2022

loleg 10:34

GitHub

katharine jarmul on Twitter

“This is in opposition to data where we learn on real datasets, of real people, often collected without consent and then make "synthetic data" that often leaks private information! See: https://t.co/D7idLIN9EF for more.”


May 19, 2022

loleg 10:01

GitHub

GitHub - ArtLabss/open-data-anonymizer: Python Data Anonymization & Masking Library For Data Science Tasks

Python Data Anonymization & Masking Library For Data Science Tasks - GitHub - ArtLabss/open-data-anonymizer: Python Data Anonymization & Masking Library For Data Science Tasks


June 01, 2022

loleg 11:20

Unity Blog

Boosting computer vision performance with synthetic data | Unity Blog


June 27, 2022

loleg 07:31

Your agent will be evaluated on 1000 episodes and will have a total available time of 48 hours to finish. Your submissions will be evaluated on AWS EC2 p2.xlarge instance which has a Tesla K80 GPU (12 GB Memory), 4 CPU cores, and 61 GB RAM. If you need more time/resources for evaluation of your submission please get in touch.

AI Habitat

Habitat Challenge 2022

Overview In 2022, we are hosting the ObjectNav challenge in the Habitat simulator 1. ObjectNav focuses on egocentric object/scene recognition and a commonsense understanding of object semantics (where is a bed typically located in a house?). For details on how to participate, submit and train age...


July 16, 2022

loleg 05:58

Twitter

Living with Machines on Twitter

“ICYMI: Machine learning models! On Living with Machines we are grappling with large and varied digitised collections. These include textual sources, maps, census records etc. One way in which we can try and manage this scale and complexity is through machine learning. 🧵”


August 17, 2022

loleg 16:38

loleg 20:25

GitHub

OpenBytes

OpenBytes, a Sandbox Project in the LF AI & Data Foundation., aims to bring transformational changes to AI by making open datasets more available & accessible. - OpenBytes


September 07, 2022

loleg 14:14

Unity Blog

Confidential Computing Consortium - Open Source Community

The Confidential Computing Consortium is focused on projects securing data in use and accelerating the adoption of confidential computing.


December 16, 2022

loleg 18:52

GitHub

GitHub - brianvoe/gofakeit: Random fake data generator written in go

Random fake data generator written in go. Contribute to brianvoe/gofakeit development by creating an account on GitHub.


January 09

loleg 15:58

About Syntheticus

Our privacy-preserving synthetic data platform help unlock your data's potential and give you the freedom to use and share data with confidence. Learn more


January 12

loleg 21:21

GitHub

GitHub - karpathy/makemore: An autoregressive character-level language model for making more things

An autoregressive character-level language model for making more things - GitHub - karpathy/makemore: An autoregressive character-level language model for making more things


January 24

loleg 17:44

loleg 20:40

loleg 22:01

Differential Privacy: A Primer for a Non-Technical Audience

loleg 22:18

arXiv.org

Privacy-Preserving Synthetic Location Data in the Real World

Sharing sensitive data is vital in enabling many modern data analysis and machine learning tasks. However, current methods for data release are insufficiently accurate or granular to provide meaningful utility, and they carry a high risk of deanonymization or membership inference attacks. In this...

loleg 22:27

Zero-Knowledge Proofs

Power Automate with copilot; the back story - Microsoft Research

With Satya’s copilot announcements at Microsoft Ignite in the rear-view mirror, it’s a good time to talk more about the kind of work and creative thinking that made it possible. If you aren’t already familiar with the new ways to innovate with AI, such as the AI-based copilot to build your flow i...


loleg:

A de-identification protocol for open data 

Open data initiatives are increasing everywhere. These are often driven by the public sector aiming to promote data availability to trigger innovation and event

forum.opendata.ch

What does the 'E-ID Consortium' do with my data?

Session #2 room C.02 @ Opendata.ch/2019 Participants: ca. 4 Adapted from the original protocol in German of “Was macht das #eID-Konsortium mit meinen Daten?”. Participants’ experiences on the topic Board. Community. Developer. The question of electronic identity is hotly discussed at the Sw...


December 16, 2021

loleg:

ProtonMail Blog

Introducing Privacy Decrypted - ProtonMail Blog

Knowledge is power. Today, we're launching Privacy Decrypted, a new initiative to arm activists with essential knowledge.


January 24, 2022

System

:

10:41

@jurek joined the channel.


April 28, 2022

loleg:

"Upload your unredacted sensitive documents and our magic AI will remove PII for you!"

https://news.ycombinator.com/item?id=25275637 Amnesia – High-Accuracy Data Anonymization (openaire.eu)

https://news.ycombinator.com/item?id=20811372 Presidio: Customizable data protection and PII data anonymization

GitHub

GitHub - psal/anonymouth

Contribute to psal/anonymouth development by creating an account on GitHub.

Almost Secure

Insights from Avast/Jumpshot data: Pitfalls of data anonymization

Analyzing a sample of Jumpshot data confirms the suspicion that Avast did indeed sell personally identifiable data of their users, lots of it.

Twitter

Sharon Zhou on Twitter

“@ZimMatthias @mattlungrenMD @AureliaAugusta @adjiboussodieng @codyaustun @JvNixon @andrey_kurenkov We get the pretrained checkpoint here! Creds to the LSC-CNN folks for releasing their weights. It should be a google drive link if you navigate their README. https://t.co/TSVmE9mtnh”

Weblaw AG

Pseudonym und Anonymität

Sind «Namen Schall und Rauch»? Vermutlich sind sie dies eher nicht, vielmehr aufgeladen mit Bedeutungen, Zuschreibungen und Erwartungen, verbürgen Bonität oder das Gegenteil davon, all dies unter spezifischen Rahmenbedingungen. Das pseudonyme Verwenden eines anderen Namens kann es ermöglichen, di...


November 16, 2022

loleg:

Confidential Computing Consortium

Confidential Computing Consortium - Open Source Community

The Confidential Computing Consortium is focused on projects securing data in use and accelerating the adoption of confidential computing.

System

:

@loleg updated the channel header to:

See also ~Synthetic

loleg:

GitHub

GitHub - microsoft/Tools-for-Health-Data-Anonymization: Set of tools for helping with data (in FHIR format) anonymization.

Set of tools for helping with data (in FHIR format) anonymization. - GitHub - microsoft/Tools-for-Health-Data-Anonymization: Set of tools for helping with data (in FHIR format) anonymization.

GDPR compliant testing.

The free, extendable, open source data anonymization tool.

Image Pasted at 2022-11-16 14-33.png

@jurek will you care to join an upcoming hackathon? This topic is really heating up here in Bern.


November 25, 2022

jurek:

In principle sure :slightly_smiling_face:. Although, I am quite busy at the moment. Are you at REJOHA? If yes maybe we could talk there.

loleg:

As usually am distracted by hundreds of things at a hackathon, only saw your reply now :slightly_smiling_face:


December 20, 2022

loleg:

Anonymisation is for everyone - ODI Learning

Our anonymisation course will introduce you to the latest thinking on the topic. Through practical exercises, we will develop your understanding.