Synthetic ♻️

This is the start of the Synthetic channel, created by loleg on January 22, 2022. Any member can join and read this channel. This channel's purpose is: Synthetic data generated from computer simulations or algorithms provides an inexpensive alternative to real-world data that's increasingly used to create accurate AI models.

loleg 14:15

https://www.census.gov/about/what/synthetic-data.html

Census.gov

What Are Synthetic Data?

Example datasets:

Verkehrsmodellierung im UVEK (VM-UVEK): Daten des Nationalen Personenverkehrsmodells 2017
Synthetic end-use specific electric household load profiles for four weather years in 29 European countries
Energy Distribution Network Generator based on real world data

https://github.com/theodi/synthetic-data-tutorial/

GitHub

GitHub - theodi/synthetic-data-tutorial: A hands-on tutorial showing how to use Python to do anonymisation with synthetic data

A hands-on tutorial showing how to use Python to do anonymisation with synthetic data - GitHub - theodi/synthetic-data-tutorial: A hands-on tutorial showing how to use Python to do anonymisation wi...

Open-sourcing HASH - HASH

A few months ago we revealed we were working on something new. The Block Protocol is now in public draft, and an early preview of our new product, HASH, is available for download on GitHub under an open-source license.

January 31, 2022

loleg 21:11

https://gretel.ai/synthetics

Eliminate Privacy Concerns With Synthetic Data - Gretel

Train machine learning models on your dataset and generate synthetic data that is statistically equivalent. Get started free with a single click.

February 17, 2022

loleg 09:43

https://www.economist.com/science-and-technology/the-un-is-testing-technology-that-processes-data-confidentially/21807385

The Economist

The UN is testing technology that processes data confidentially

How to analyse data without revealing their secrets

via https://opencollective.com/openmined

OpenMined - Open Collective

OpenMined is an open-source community whose goal is to make the world more privacy-preserving by lowering the barrier-to-entry to private AI technologies.

February 18, 2022

loleg 09:46

https://curated.health/articles-kbzfijbd/post/post-slug-9Vu5gm3KEt48xcD

Curated Health

𝐔𝐬𝐢𝐧𝐠 #𝐒𝐲𝐧𝐭𝐡𝐞𝐭𝐢𝐜𝐝𝐚𝐭𝐚 𝐟𝐨𝐫 𝐩𝐫𝐞𝐜𝐢𝐬𝐢𝐨𝐧 𝐡𝐞𝐚𝐥𝐭𝐡𝐜𝐚𝐫𝐞 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐝𝐚𝐭𝐚 𝐩𝐫𝐢𝐯𝐚𝐜𝐲 𝐜𝐨𝐧𝐜𝐞𝐫𝐧𝐬!

💻 Synthetic data has been defined as “the generation of artificial data with the aim of reproducing the statistical properties of an original dataset” and it intends to capture the original ...

February 25, 2022

loleg 07:20

https://fakerjs.dev/

Faker | Faker

Generate massive amounts of fake (but reasonable) data for testing and development.

loleg 20:05

A separate, but related, proposal aims to address “synthetic content,” an umbrella term encompassing fake news, synthetic audio, and deepfake images and videos, in which a person’s face is stitched onto someone else using AI. Among the provisions, makers of deepfake software would be required to verify the real names of creators and “conspicuously label” any deepfaked content. Deepfake apps, popular in China, have stirred public debate over privacy and ownership of personal data. https://www.wired.com/story/china-regulate-ai-world-watching/?mc_cid=1e1ad5e9ba&mc_eid=6ebf3bae07

February 26, 2022

loleg 19:12

...For all its benefits, open data also carries risk. Open data can certainly accelerate AI development, but using massive public datasets to train models can unintentionally undermine privacy or perpetuate encoded biases. Even the pioneering ImageNet data faced some of these risks as creators removed people-related categories and blurred individuals’ faces to try to protect their privacy.7 For open datasets released by the public sector, government leaders should be cognizant of the risks and take steps to ensure that open data offers a safe path to future AI. https://www2.deloitte.com/us/en/insights/industry/public-sector/open-data-ai-explainable-trustworthy.html

Deloitte Insights

Trustworthy open data for trustworthy AI

Government leaders need to be cognizant of the risks of using open data to train AI and ensure a safe path to bring benefits to citizens.

March 06, 2022

loleg 16:31

https://observablehq.com/@tvirot/machine-learning-in-javascript-with-tensorflow-js

Machine Learning in JS with TensorFlow.js (Part I)

Created for AI Day events in Bangkok (Apr 21, 2018) and Singapore (Apr 29, 2018), this tutorial is based on: TensorFlow.js Introduction to the TensorFlow.js core API Machine Learning in JavaScript (TensorFlow Dev Summit 2018) TensorFlow.js is a general purpose, WebGL-accelerated numerics platform...

April 06, 2022

loleg 10:34

https://github.com/SchweizerischeBundesbahnen/SynPopToolbox

GitHub

GitHub - SchweizerischeBundesbahnen/SynPopToolbox: SynPopToolbox is a Python framework designed for analysis, visualization and manipulation of a synthetic population produced by the land-use simulation software FaLC (https://github.com/falc-sim-org/FaLC) and related subproducts. Contact: davi.gu...

SynPopToolbox is a Python framework designed for analysis, visualization and manipulation of a synthetic population produced by the land-use simulation software FaLC (https://github.com/falc-sim-or...

loleg 22:44

https://twitter.com/kjam/status/1506662887108009994

Twitter

katharine jarmul on Twitter

“This is in opposition to data where we learn on real datasets, of real people, often collected without consent and then make "synthetic data" that often leaks private information! See: https://t.co/D7idLIN9EF for more.”

May 19, 2022

loleg 10:01

https://github.com/ArtLabss/open-data-anonymizer

GitHub

GitHub - ArtLabss/open-data-anonymizer: Python Data Anonymization & Masking Library For Data Science Tasks

Python Data Anonymization & Masking Library For Data Science Tasks - GitHub - ArtLabss/open-data-anonymizer: Python Data Anonymization & Masking Library For Data Science Tasks

June 01, 2022

loleg 11:20

https://blog.unity.com/technology/boosting-computer-vision-performance-with-synthetic-data

Unity Blog

Boosting computer vision performance with synthetic data | Unity Blog

June 27, 2022

loleg 07:31

https://aihabitat.org/challenge/2022/

Your agent will be evaluated on 1000 episodes and will have a total available time of 48 hours to finish. Your submissions will be evaluated on AWS EC2 p2.xlarge instance which has a Tesla K80 GPU (12 GB Memory), 4 CPU cores, and 61 GB RAM. If you need more time/resources for evaluation of your submission please get in touch.

AI Habitat

Habitat Challenge 2022

Overview In 2022, we are hosting the ObjectNav challenge in the Habitat simulator 1. ObjectNav focuses on egocentric object/scene recognition and a commonsense understanding of object semantics (where is a bed typically located in a house?). For details on how to participate, submit and train age...

July 16, 2022

loleg 05:58

https://twitter.com/LivingwMachines/status/1547870794490859521?s=20&t=syttclhwDyNiKx_RGl7ZUg

Twitter

Living with Machines on Twitter

“ICYMI: Machine learning models! On Living with Machines we are grappling with large and varied digitised collections. These include textual sources, maps, census records etc. One way in which we can try and manage this scale and complexity is through machine learning. 🧵”

August 17, 2022

loleg 16:38

https://www.turing.ac.uk/events/turing-roche-knowledge-share-series-synthetic-data

loleg 20:25

https://github.com/Project-OpenBytes

GitHub

OpenBytes

OpenBytes, a Sandbox Project in the LF AI & Data Foundation., aims to bring transformational changes to AI by making open datasets more available & accessible. - OpenBytes

September 07, 2022

loleg 14:14

https://blog.unity.com/technology/use-unitys-perception-tools-to-generate-and-analyze-synthetic-data-at-scale-to-train

Unity Blog

Use Unity’s computer vision tools to generate and analyze synthetic data at scale to train your ML models | Unity Blog

November 16, 2022
14:30

@loleg updated the channel header to:

Confidential Computing Consortium - Open Source Community

The Confidential Computing Consortium is focused on projects securing data in use and accelerating the adoption of confidential computing.

December 16, 2022

loleg 18:52

https://github.com/brianvoe/gofakeit

GitHub

GitHub - brianvoe/gofakeit: Random fake data generator written in go

Random fake data generator written in go. Contribute to brianvoe/gofakeit development by creating an account on GitHub.

January 09

loleg 15:58

https://syntheticus.ai/about-syntheticus

About Syntheticus

Our privacy-preserving synthetic data platform help unlock your data's potential and give you the freedom to use and share data with confidence. Learn more

January 12

loleg 21:21

https://github.com/karpathy/makemore

GitHub

GitHub - karpathy/makemore: An autoregressive character-level language model for making more things

An autoregressive character-level language model for making more things - GitHub - karpathy/makemore: An autoregressive character-level language model for making more things

January 24

loleg 17:44

https://petlab.officialstatistics.org/

loleg 20:40

https://courses.openmined.org/courses/foundations-of-private-computation

loleg 22:01

https://dash.harvard.edu/handle/1/38323292

Differential Privacy: A Primer for a Non-Technical Audience

loleg 22:18

https://arxiv.org/abs/2108.02089v1

arXiv.org

Privacy-Preserving Synthetic Location Data in the Real World

Sharing sensitive data is vital in enabling many modern data analysis and machine learning tasks. However, current methods for data release are insufficiently accurate or granular to provide meaningful utility, and they carry a high risk of deanonymization or membership inference attacks. In this...

loleg 22:27

https://zkp.science/

Zero-Knowledge Proofs

What is a zero-knowledge proof?

What are they, how do they work, and are they fast yet?

https://admindatahandbook.mit.edu/book/v1.0/diffpriv.html#sec:dp-intro

6 Designing Access with Differential Privacy | Handbook on Using Administrative Data for Research and Evidence-based Policy

via https://docs.opendp.org/en/stable/user/getting-started.html

February 04

loleg 22:59

https://gitlab.com/healthdatahub/tsfaker

GitLab

Health Data Hub / tsfaker · GitLab

Generate tabular fake data conforming to a Table Schema

March 22

loleg 13:46

https://www.microsoft.com/en-us/research/group/dynamics-insights-apps-artificial-intelligence-machine-learning/articles/power-automate-with-copilot-the-back-story/

Microsoft Research

Power Automate with copilot; the back story - Microsoft Research

With Satya’s copilot announcements at Microsoft Ignite in the rear-view mirror, it’s a good time to talk more about the kind of work and creative thinking that made it possible. If you aren’t already familiar with the new ways to innovate with AI, such as the AI-based copilot to build your flow i...

loleg:

13:42

https://iapp.org/news/a/a-de-identification-protocol-for-open-data/

A de-identification protocol for open data

Open data initiatives are increasing everywhere. These are often driven by the public sector aiming to promote data availability to trigger innovation and event

https://forum.opendata.ch/t/what-does-the-e-id-consortium-do-with-my-data/556

forum.opendata.ch

What does the 'E-ID Consortium' do with my data?

Session #2 room C.02 @ Opendata.ch/2019 Participants: ca. 4 Adapted from the original protocol in German of “Was macht das #eID-Konsortium mit meinen Daten?”. Participants’ experiences on the topic Board. Community. Developer. The question of electronic identity is hotly discussed at the Sw...

December 16, 2021

loleg:

14:39

https://protonmail.com/blog/introducing-privacy-decrypted/

ProtonMail Blog

Introducing Privacy Decrypted - ProtonMail Blog

Knowledge is power. Today, we're launching Privacy Decrypted, a new initiative to arm activists with essential knowledge.

January 24, 2022

System

10:41

@jurek joined the channel.

April 28, 2022

loleg:

09:14

https://news.ycombinator.com/item?id=21339374

"Upload your unredacted sensitive documents and our magic AI will remove PII for you!"

https://news.ycombinator.com/item?id=25275637 Amnesia – High-Accuracy Data Anonymization (openaire.eu)

https://news.ycombinator.com/item?id=20811372 Presidio: Customizable data protection and PII data anonymization

https://github.com/psal/anonymouth

GitHub

GitHub - psal/anonymouth

Contribute to psal/anonymouth development by creating an account on GitHub.

https://palant.info/2020/02/18/insights-from-avast/jumpshot-data-pitfalls-of-data-anonymization/

Almost Secure

Insights from Avast/Jumpshot data: Pitfalls of data anonymization

Analyzing a sample of Jumpshot data confirms the suspicion that Avast did indeed sell personally identifiable data of their users, lots of it.

https://twitter.com/realSharonZhou/status/1286641375648321536

Twitter

Sharon Zhou on Twitter

“@ZimMatthias @mattlungrenMD @AureliaAugusta @adjiboussodieng @codyaustun @JvNixon @andrey_kurenkov We get the pretrained checkpoint here! Creds to the LSC-CNN folks for releasing their weights. It should be a google drive link if you navigate their README. https://t.co/TSVmE9mtnh”

https://jusletter-it.weblaw.ch/issues/2018/24-Mai-2018/pseudonym-und-anonym_e79f2f0405.html__ONCE&login=false

Weblaw AG

Sind «Namen Schall und Rauch»? Vermutlich sind sie dies eher nicht, vielmehr aufgeladen mit Bedeutungen, Zuschreibungen und Erwartungen, verbürgen Bonität oder das Gegenteil davon, all dies unter spezifischen Rahmenbedingungen. Das pseudonyme Verwenden eines anderen Namens kann es ermöglichen, di...

November 16, 2022

loleg:

14:29

https://confidentialcomputing.io/

Confidential Computing Consortium

Confidential Computing Consortium - Open Source Community

The Confidential Computing Consortium is focused on projects securing data in use and accelerating the adoption of confidential computing.

System

14:30

@loleg updated the channel header to:

GitHub - microsoft/Tools-for-Health-Data-Anonymization: Set of tools for helping with data (in FHIR format) anonymization.

Set of tools for helping with data (in FHIR format) anonymization. - GitHub - microsoft/Tools-for-Health-Data-Anonymization: Set of tools for helping with data (in FHIR format) anonymization.

https://amnesia.openaire.eu/

Amnesia Anonymization Tool - Data anonymization made easy

https://realrolfje.github.io/anonimatron/

Anonimatron

GDPR compliant testing.

The free, extendable, open source data anonymization tool.

Image Pasted at 2022-11-16 14-33.png

@jurek will you care to join an upcoming hackathon? This topic is really heating up here in Bern.

November 25, 2022

jurek:

07:46

In principle sure :slightly_smiling_face:. Although, I am quite busy at the moment. Are you at REJOHA? If yes maybe we could talk there.

loleg:

22:37

As usually am distracted by hundreds of things at a hackathon, only saw your reply now :slightly_smiling_face:

December 20, 2022

loleg:

20:13

https://opendp.org/

OpenDP

March 06

loleg:

23:53

https://github.com/microsoft/synthetic-data-showcase

GitHub

GitHub - microsoft/synthetic-data-showcase: Generates synthetic data and user interfaces for privacy-preserving data sharing and analysis.

Generates synthetic data and user interfaces for privacy-preserving data sharing and analysis. - GitHub - microsoft/synthetic-data-showcase: Generates synthetic data and user interfaces for privacy...

March 09

loleg:

10:03

https://learning.theodi.org/courses/anonymisation-is-for-everyone?utm_campaign=ODI%20Learning%20Course%20%7C%20Anonymisation%20is%20for%20Everyone&utm_medium=email&utm_content=248961231&utm_source=hs_email

Anonymisation is for everyone - ODI Learning

Our anonymisation course will introduce you to the latest thinking on the topic. Through practical exercises, we will develop your understanding.